What’s what in TensorFlow 2.0

TensorFlow 2.0

I think everyone can agree the new TensorFlow 2.0 is a revolution rather than evolution. It has greatly simplified almost every aspect of the clunky TF1. And while the TensorFlow programmers made it easier to transition to the new Framework by creating the TF2 Upgrade Script, they have undeniably complicated things a bit for newcomers. We now live in a world of billion samples and pieces of StackOverflow snippets and information that at least in the beginning are hard to navigate. You never know what is TensorFlow 1 or 2 or in-between as there was an in-between phase too to make things worse.

So, before we go any further here is what you will learn from this article:

  • How to distinguish new TensorFlow 2.0 code from the old 1.x (handy when you look for solutions on StackOverflow)
  • How and why Keras, everyone’s favourite high-level TF API got integrated into new TensorFlow
  • How Eager mode (now enabled by default) improved our experimenting experience and how it freed us from the old TF1 Graph
  • How Gradient Tape both simplified debugging and allowed for easier extensibility of our network models

Hope you will find this useful, let’s dig in.

Keras

Keras for a long time has been a great tool for prototyping and learning how to design neural networks. the 2018 Deep Learning Framework Power Scores edition clearly showed that we live in the TensorFlow/Keras dominated the world. This makes sense, no one invested as much in A.I. in the opensource community as Google. Keras, on the other hand, has made it easier to begin working with neural networks. So, what is Keras? It is a high-level abstraction over the TensorFlow. It uses TensorFlow as a computation engine while hiding all of its complexities. This allows users to create neural networks in just a couple of lines:

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])
Or by using the .add() like so:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

As you can see that’s fairly simple way of declaring model inputs and layers. You can find a lot of Keras examples here. Google engineers have integrated Keras in the TensorFlow 2.0 and have removed or modified TensorFlow codebase to use the existing often more elegant solutions from Keras framework. So how will you know you are looking at TensorFlow 2.0 code? By looking at the imports. Keras is now imported directly from TensorFlow module so if you see

from tensorflow import keras

or

tf.keras

anywhere in the code, you know this is new TensorFlow 2.0 code. It’s easily distinguishable from the older standalone Keras which was its own global module:

from keras.models import Sequential

from keras.layers import Dense, Activation

Eager by Default

To understand what influenced the creation of the eager mode you first need to understand the old days of TF1. The core principle there was the Graph. You would create your code by declaring some TF operations and variables and then when you would run your code, TensorFlow would compile a Graph from your code. This Graph had a lot of limitations and was fairly hard to debug. Since code was only an abstract representation of Graph you couldn’t just step through it – you would have to analyse the more or less complicated Graphs:

Now in the new world of TensorFlow 2.0 Graphs are no longer first-class citizens. They have been pushed back a bit by new tools like GradientTape module and TF. Function Python decorator – we will talk about them in details later. What they essentially allowed is stepping through the code as it trains the network allowing us to closely observe any misbehaving code in the learning pipeline. This is a big deal for more complex and custom models as it:

  • Allows us to structure our code better without restrictions of the available TF Operations
  • Allows us to use standard python debugger to inspect and change running code
  • Allows us to use standard python control flow operations like if/while loops as opposed to conditional operators provided by TF1

Below you will see some very intuitive code that you can step through debug:

model = Linear()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

steps = 300
for i in range(steps):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

You can see/debug the gradients (weights of layers neurons) on every learning step, and you can apply any custom math operation on them too. As you can imagine this adds a great value for experimentation and debugging of new cleaver models and architectures.

Gradient Tape

First things first, a quick reminder of why gradients are important. Gradients represent the ever-changing weights of the neurons in every layer of our network. The changes in gradients are what allows the network to change on every learning step. This is why it is important that gradients are tracked on every step or a loop of our learning cycle. Gradient tape is a very important tool that enabled the eager mode to be so successful. To put it short GradientTape replaced old TF1 Graphs responsibility for tracking all the gradients and other trainable variables in the learning process. GradientTape is sort of a recorder that keeps track of every trainable variable inside its scope. Consider the below code:

def step(self, model, input, target): 

    with tf.GradientTape() as tape:

        predictions = model(input)  
        loss = self.loss_function(target, predictions)
        grads = tape.gradient(loss, model.variables)
        self.optimizer.apply_gradients(zip(grads, model.variables)) 

        return loss

Here we can see the single Step definition ruining in default Eager mode meaning we can easily debug every line and see what is happening. What the code does is creating a GradientTape in the scope of which we:

  • Use some model to return prediction values for this single pass.
  • Use some previously defined loss function to calculate loss value. Loss value is the difference between what the network predicted and what we know it should predict (based on the labels from our training dataset).
  • Calculate new Gradients by deafferenting our loss function result against the current model Gradients (essentially this will be Gradients from the previous step). Model gradients are hidden under model.variables property.
  • Uses previously defined optimizer to apply our gradients on the current model gradients (zip provides an iterator of Tuples that the optimizers require to do their work).

In comparison to the old Graph days, this is almost a magical tool. GradientTape allows us to better understand and influence the process of updating gradients which is the learning process itself.

What’s next?

I hope you found this article useful and it will help you both understand and switch to TensorFlow 2.0 in the near future. In my next article, I will talk more about new equally exciting features of TensorFlow 2.0. What you will learn there is:

  • How Sub classing allowed us to write cleaner code and at the same time empowered us to create custom Models and Layers easier
  • How TF.Function decorator allowed us to create more maintainable mini graphs (don’t worry they are step-through debuggable not like the old Graphs)
  • How much easier it got to navigate the Framework after the namespaces cleanup (it is more exciting than it sounds)

Stay tuned for my next article and as always feel free to leave your thoughts in the comments below.

Tags: ,