Aug '20 tensorwerk Heartbeat
Hey there! This month we are proud to present our submission to the PyTorch Summer Hackathon 2020 and a glimpse of the Hangar data storage design.
Every month we are sharing news on projects we are working on, conferences and events we attend, what are our plans for the future and everything that might be related to data.
PyTorch Summer Hackathon 2020
We participated in the PyTorch Summer Hackaton 2020 with our Stockroom project! We talked about Stockroom in the last and in the very first heartbeat: it’s a very lightweight ML lifecycle tool based on git and Hangar. If you are not familiar with the project, be sure to check it out.
Our Sherin also made an introductory video, showing a step-by-step walkthrough on how to set up a Stockroom repository, how to import a dataset from torchvision, and how to save experiments metadata and model weights during the training of a neural network.
If you are curious, here is the official submission 👉https://devpost.com/software/stockroom
Did you know that … Data Is Large, We Don’t Waste Space
When a user requests to add data to a Hangar repository, one of the first operations Hangar does is to generate a hash of the array contents. If the hash does not match a piece of data already placed in the Hangar repository, the data is sent to the appropriate storage backend methods.
On the other hand, if a data sample is added to a repository that already has a record of some hash, we don’t even involve the storage backend. All we need to do is just record that a new sample in a column was added with that hash. It makes no sense to write the same data twice.
Read more about the Hangar design on the official documentation 👉 https://hangar-py.readthedocs.io/en/stable/design.html
Reach out
If you’d like to have a peek into our vision and our upcoming developments, please send us a note at info@tensorwerk.com. In any case, we will be posting our updates regularly here on Substack. Have fun and stay tuned.
If you want to stay up to date with ideas, projects and plans for the future at tensorwerk, subscribe to our publication and receive the Heartbeat directly in your inbox.