Jan '20 [tensor]werk Heartbeat

Feb 01, 2020

Hey Folks! This is our very first [tensor]werk Heartbeat and we are very excited to start sharing what we do at [tensor]werk and what’s on our mind!

Every month we are sharing news on projects we are working on, conferences and events we attend, what are our plans for the future and everything that might be related to data.

Welcome

In this first Heartbeat, we thought it would be good to introduce ourselves and the projects we are working on. Although the company is new, we already have several interesting pieces to show.

Who we are

So, [tensor]werk: we are a small group of people coming from software development and data science. While [tensor]werk is incorporated in NY, we are a distributed team, currently spanning the US, India and Italy.

Our dream is to build tools for developers of systems whose behavior is learnt from data, rather than coded top-down. We call it data-defined software (this concept has been well captured by Andrej Karpathy in his Software 2.0 post from a while back).

As machine learning is entering the realm of software development, best software engineering practices developed over the course of decades need to be recast to this new paradigm. Many questions arise:

If data is the new source code, how can we keep track of it as it changes over time?
How do we work collaboratively on data-defined software?
How can we test data-defined software and what is TDD in this context?
How can we audit data-defined software and describe its behavior?
How do we factor data-defined software in building blocks and compose them at scale?
How do we build and maintain production systems at scale?

Of course, we’re not the only ones mumbling on this, at the same time there’s so much to do and a lot of space to get creative.

Now, how does all this translate into practice? At [tensor]werk we are currently working at a suite of tools, each with its own individual focus, each one filling a gap in the current tooling landscape.

What’s on our plate

Here is an outlook on the tools we are building at present:

RedisAI: is a Redis module for serving tensors and executing deep learning models, developed in collaboration with Redis Labs. It turns Redis into a multi-backend deep learning runtime (it currently supports PyTorch, TF, TFLite, ONNX, ONNX-ML on CPU and GPU) while retaining the operational simplicity of Redis. With RedisAI, you can literally productionize your model in minutes.
We recently integrated RedisAI as a deploy target for MLFlow and more integrations with lifecycle management tools are on the way.
Hangar: is a Python module and CLI providing versioning for numerical data. Think git for tensors, or multidimensional arrays. It is designed to solve many of the problems tackled by version control systems for source code, just adapted to numerical data, so:
- time traveling through the history of a dataset
- zero cost branching
- merging and conflict resolution
- cloning, pushing, pulling to/from remotes
Working Hangar is convenient. One can work with a huge dataset and just materialize a part of it locally. Or work collaboratively on a dataset making sure no changes get lost. Or train models on different branches from different processes, all against the same Hangar repo.
On top of all this, Hangar provides fast access to data and can compress data very effectively. Using it from PyTorch or TensorFlow, instead of data files, is literally a one-liner.
Stockroom: is a very recent development. It is a tool built on top of Hangar and git to version models, data, (hyper)parameters and metrics alongside your model source code. It’s a natural complement to Hangar, which is focused on versioning numerical data. At the same time, it provides a simpler surface for users to track their experiments without necessarily becoming Hangar or git ninjas. We are working hard to ship an alpha release soon!

What’s going on

After months perfecting our vision and developing core projects, we are getting close to releasing RedisAI and Hangar in general availability. Meaning that users can start trusting these tools for production use.

The tooling landscape is very wide, and although we are filling very specific gaps, there’s work we need to do to make our tools known to our potential user-base. A website would help. Right, we are working on that too :-)

Reach out

If you’d like to have a peek into our vision and our upcoming developments, please send us a note at info@tensorwerk.com. In any case, we will be posting our updates regularly here on Substack. Have fun and stay tuned.

If you want to stay up to date with ideas, projects and plans for the future at [tensor]werk, subscribe to our publication and receive the Heartbeat directly in your inbox.

tensorwerk heartbeat

Discussion about this post