PyTorch Summer Hackathon 2020

We participated in the PyTorch Summer Hackaton 2020 with our Stockroom project! We talked about Stockroom in the last and in the very first heartbeat: it’s a very lightweight ML lifecycle tool based on git and Hangar. If you are not familiar with the project, be sure to check it out.

Our Sherin also made an introductory video, showing a step-by-step walkthrough on how to set up a Stockroom repository, how to import a dataset from torchvision, and how to save experiments metadata and model weights during the training of a neural network.

If you are curious, here is the official submission 👉

Did you know that … Data Is Large, We Don’t Waste Space

When a user requests to add data to a Hangar repository, one of the first operations Hangar does is to generate a hash of the array contents. If the hash does not match a piece of data already placed in the Hangar repository, the data is sent to the appropriate storage backend methods.

On the other hand, if a data sample is added to a repository that already has a record of some hash, we don’t even involve the storage backend. All we need to do is just record that a new sample in a column was added with that hash. It makes no sense to write the same data twice.

Read more about the Hangar design on the official documentation 👉

