1. Tensors and Shapes

Tensors are the generalization of vectors (rank 1) and matrices (rank 2) to arbitrary rank. Rank can be defined as the number of indices required to get individual elements of a tensor. A matrix requires two indices (row, column), and is thus a rank 2 tensor. We may say in normal conversation that a matrix is a “two dimensional object” because it has rows and columns, but this is ambiguous because the row could be 6 dimensions and the columns could be 1 dimension. Always use the word rank to distinguish vectors, matrices, and higher-order tensors. The components that make up rank are called axes (plural of axis). The dimension is how many elements are in a particular axis. The shape of a tensor combines all of these. A shape is a tuple whose length is the rank and elements are the dimension of each axis.

Let’s practice our new vocabulary. A euclidian vector \((x, y, z)\) is a rank 1 tensor whose 0th axis is dimension 3. Its shape is \((3)\). Beautiful. A 5 row, 4 column matrix is now called a rank 2 tensor whose axes are dimension 5 and 4. Its shape is \((5, 4)\). The scalar (real number) 3.2 is a rank 0 tensor whose shape is \(()\).

TensorFlow has a nice visual guide to tensors.


Array and tensor are synonyms. Array is the preferred word in numpy and often used when describing tensors in python. Tensor is the mathematic equivalent.

1.1. Einstein Notation

Einstein notation is the way tensor operations can be written out. We’ll be using a simplified version, based on the einsum function available in many numerical libraries. It’s relatively simple. Each tensor is written as a lower case variable with explicit indices, like \(a_{ijk}\) for a rank 3 tensor. The reason the variable name is written in lower case is because if you fill in the indices \(a_{023}\), you get a scalar. A variable without an index, \(b\), is a scalar. There is one rule for this notation: if an index doesn’t appear on both sides, it is summed over on the one side in which it appears. Einstein notation requires both sides of the equation to be written-out, so that its clear what the input/output shapes of the operation is.

Here are some examples of writing tensor operations in Einstein notation.

Total Sum

Sum all elements of a rank 4 tensor. In Einstein notation this is:

(1.1)\[\begin{equation} a_{ijkl} = b \end{equation}\]

in normal mathematic notation, this would be

(1.2)\[\begin{equation} \sum_i \sum_j \sum_k \sum_l a_{ijkl} = b \end{equation}\]

Sum Specific Axis

Sum over last axis

(1.3)\[\begin{equation} a_{ijkl} = b_{ijk} \end{equation}\]

In normal notation:

(1.4)\[\begin{equation} \sum_l a_{ijkl} = b_{ijk} \end{equation}\]

Dot Product

In Einstein notation:

(1.5)\[\begin{equation} a_{i} b_{i} = c \end{equation}\]

In normal notation:

(1.6)\[\begin{equation} \sum_i a_{i} b_{i} = c \end{equation}\]

Notice that \(a_i\) and \(b_i\) must have the same dimension in their 0th axis in order for the sum in the dot product to be valid. This makes sense, since to compute a dot product the vectors must be the same dimension. In general, if two tensors share the same index (\(b_{ij}\), \(a_{ik}\)), then that axis must be the same dimension.

Can you write the following out in Einstein notation?

Matrix Multiplication

The matrix product of 2 rank 2 tensors

Matrix Vector Product

Apply matrix \(a\) to column vector \(b\) by multiplication. \(\mathbf{A}\vec{b}\) in linear algebra notation.

Matrix Transpose

Swap the values in a matrix to make it a transpose

1.2. Tensor Operations

Although you can specify operations in Einstein notation, it is typically not expressive enough. How would you write this operation: sum the last axis of a tensor. Without knowing the rank, you do not know how many indices you should indicate in the expression. Maybe like this?

(1.10)\[\begin{equation} a_{i_0, i_1, \ldots i_N} = a_{i_0, i_1, \ldots i_{N - 1}} \end{equation}\]

Well that’s good but what if your operation has two arguments: which axis to sum and the tensor. That would also be clumsy to write. Einstein notation is useful and we’ll see it more, but we need to think about tensor operations as analogies to functions. Tensor operations take in 1 or more tensors and output 1 or more tensors and the output shape depends on the input shape.

One of the difficult things about tensors is understanding how shape is treated in equations. For example, consider this equation:

(1.11)\[\begin{equation} g = \exp\left(a - b\right)^2 \end{equation}\]

Seems like a reasonable enough equation. But what if \(a\) is rank 3 and \(b\) is rank 1? Is \(g\) rank 1 or 3 then? Actually, this is a real example and the answer is that \(g\) is rank 4. You subtract each element of \(b\) from each element of \(a\). You could write this in Einstein notation:

(1.12)\[\begin{equation} g_{ijkl} = \exp\left(a_{ijk} - b_l\right)^2 \end{equation}\]

except this function should work on arbitrary ranked \(a\) and always ouput \(g\) being the rank of \(a + 1\). Typically the best way to express this is explicitly stating how rank and shape are treated.

1.2.1. Reduction Operations

Reduction operations reduce the rank of an input tensor. sum(a, axis=0) is an example. The axis argument means that we’re summing over the 0th axis so that it will be removed. If a is a rank 1 vector, this would leave us with a scalar. If a is a matrix, this would remove the rows so that only columns are left over. That means we would be left with column sums. You can also specify a tuple of axes to removed, which will be done in that order sum(a, axis=(0,1) ).

In addition to sum, there are min, max, any (logical or), and more. Let’s see some examples

import numpy as np

a_shape = (4, 3, 2)
a_len = 4 * 3 * 2

a = np.arange(a_len).reshape(a_shape)
(4, 3, 2)
[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]

 [[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]]

Try to guess the shape of the output tensors in the below code based on what you’ve learned.

b = np.sum(a, axis=0)
(3, 2)
[[36 40]
 [44 48]
 [52 56]]
c = np.any(a > 4, axis=1)
(4, 2)
[[False  True]
 [ True  True]
 [ True  True]
 [ True  True]]
d = np.product(a, axis=(2,1))
[       0   332640  8910720 72681840]

1.2.2. Element Operations

Default operations in python, like + - * / ^ , are also tensor operations. They preserve shape so that the ouput shape is the same as the inputs’. The input tensors must have the same shape or be able to become the same shape through Broadcasting, which is defined in the next section.

(4, 3, 2)
b = np.ones((4, 3, 2))
(4, 3, 2)
c = a * b
(4, 3, 2)

1.3. Broadcasting

One of the difficulties with the elementary operations is that they require the input tensors to have the same shape. For example, you cannot multiply a scalar (rank 0) and a vector (rank 1). This is where broadcasting comes in. Broadcasting increases the rank of one of the input tensors to be compatible with another. Broadcasting works at the last axis and works its way forward. Let’s see an example

(1.13)\[\begin{equation} A + B \end{equation}\]

Input A

Rank 2, shape is (2, 3)

 4  3  2 
-1  2  4

Input B

Rank 1, shape is (3), a vector:


Now let’s see how the broadcasting works. Broadcasting starts by lining up the shapes from the end of the tensors

Step 1: align on last axis

tensor        shape
A:             2  3
B:                3
broadcasted B: .  .

Step 2: process last axis

Now broadcasting looks at the last axis (axis 1) and if one tensor has axis dimension 1, its value is copied to match the others. In our case, they agree.

tensor        shape
A:             2  3
B:                3
broadcasted B: .  3

Step 3: process next axis

Now we examine the next axis, axis 0. B has no axis there, because its rank is too low. Broadcasting will insert a new axis by (i) inserting a new axis with dimension 1 and (ii) copying the value at this new axis until its dimension matches.

Step 3i:

Add new axis of dimension 1. This is like making \(B\) have 1 row and 3 columns:

 3  0  1

Step 3ii:

Now we copy the values of this axis until its dimension matches \(A\)’s axis 0 dimension. We’re basically copying \(b_{0j}\) to \(b_{1j}\).

 3  0  1
 3  0  1


tensor        shape
A:             2  3
B:                3
broadcasted B: 2  3

Now, we compute the result by addition elementwise.

A + B
 3 + 4  0 + 3  1 + 2  =   7  3  3
 3 - 1  0 + 2  1 + 4      2  2  5

Let’s see some more examples, but only looking at the input/output shape

A Shape

B Shape

Output Shape













Try some for yourself!

A Shape

B Shape

Output Shape




(16, 16, 3)









1.3.1. Suggested Reading for Broadcasting

You can read more about broadcastnig in the numpy tutorial or the Python Data Science Handbook.

1.4. Modifying Rank

The last example we saw brings up an interesting questions: what if we want to add a (1,4) and (16,) to end up with a (4,16) tensor? We could insert a new axis at the end of \(B\) to make its shape (16, 1). This can be done using the newaxis sytnax:

a = np.ones((1,4))
b = np.ones(16,)
result = a + b[:, np.newaxis]
(16, 4)

Just as newaxis can increase rank, we can decrease rank. One way is to just slice, like a[0]. A more genaral way is to squeeze which removes any axes that are dimension 1.

a = np.ones((1,32, 4, 1))
print('before squeese:', a.shape)
a = np.squeeze(a)
print('after squeese:', a.shape)
before squeese: (1, 32, 4, 1)
after squeese: (32, 4)

1.4.1. Reshaping

The most general way of changing rank and shape is through reshape. This allows you to reshape a tensor, as long as the number of elements remains the same. You could make a (4, 2) into an (8,). You could make a (4, 3) into a (1, 4, 3, 1). Thus it can accomplish the two tasks done by squeeze and newaxis.

There is one special syntax element to shaping: -1 dimensions. -1 can appear once in a reshape command and means to have the computer figure out what goes there by following the rule that the number of elements in the tensor must remain the same. Let’s see some examples.

a = np.arange(32)
new_a = np.reshape(a, (4, 8))
(4, 8)
new_a = np.reshape(a, (4, -1))
(4, 8)
new_a = np.reshape(a, (1, 2, 2, -1))
(1, 2, 2, 8)

1.4.2. Rank Slicing

I assume you’re familiar with slicing in numpy/python. I’ll refer you to the Python Tutorial and the numpy tutorial for a refresher if you need it. Rank Slicing is just my terminology for slicing without knowing the rank of a tensor. Use the ... (ellipsis) keyword. This allows you to account for unknown rank when slicing. Examples:

  • Access last axis: a[...,:]

  • Access last 2 axes: a[...,:,:]

  • Add new axis to end a[...,np.newaxis]

  • Add new axis to beginning a[np.newaxis,...]

Let’s see if we can put together our skills to implement the equation example from above,

(1.14)\[\begin{equation} g = \exp\left(a - b\right)^2 \end{equation}\]

for arbitrary rank \(a\). Recall \(b\) is a rank 1 tensor and we want \(g\) to be the rank of \(a + 1\).

def eq(a, b):
    return np.exp((a[...,np.newaxis] - b)**2)

b = np.ones(4)
a1 = np.ones( (4,3) )
a2 = np.ones( (4,3, 2, 1) )

g1 = eq(a1, b)
print('input a1:', a1.shape, 'output:', g1.shape)

g2 = eq(a2, b)
print('input a2:', a2.shape, 'output:', g2.shape)
input a1: (4, 3) output: (4, 3, 4)
input a2: (4, 3, 2, 1) output: (4, 3, 2, 1, 4)

1.5. Chapter Summary

  • Tensors are the building blocks of machine learrning. A tensor has a rank and shape that specifies how many elements it has and how they are arranged. An axis describes each element in the shape.

  • A euclidean vector is a rank 1 tensor with shape (3). It has 1 axis of dimension 3. A matrix is a rank 2 tensor. It has two axes.

  • Equations that describe operating on 1 or more tensors can be written using Enistein notation. Einstein notation uses indices to indicate the shape of tensors, how things are summed, and which axes must match up.

  • There are operations that reduce ranks of tensors, like sum or mean.

  • Broadcasting is an automatic tool in programming languages that modifies shapes of tensors make them compatible with operations.

  • Tensors can be reshaped or have ranked modified by newaxis,  reshape, and squeeze. These are not standard among the various numeric libraries in Python.

1.6. Exercises

1.6.1. Einstein notation

Write out the following in Einstein notation:

  1. Product of two matrices

  2. Trace of a matrix

  3. Outter product of two Euclidean vectors

  4. \(\mathbf{A}\) is a rank 3 tensor whose last axis is dimension 3 and contains Eulidean vectors. \(\mathbf{B}\) is Euclidean vector. Compute the dot product of each of the vectors in \(\mathbf{A}\) with B. So if \(\mathbf{A}\) is shape (11, 7, 3), it contains 11 \(\times\) 7 vectors and the output should be shape (11,7). \(\mathbf{B}\) is shape (3)

1.6.2. Reductions

Answer the following with Python code with reductions. Write your code to be as general as possible – being able to take arbitrary rank tensors unless it is specified that something is a vector.

  1. Normalize a vector so that the sum of its elements is 1. Note the rank of the vector should be unchanged.

  2. Normalize the last axis of a tensor

  3. Compute the mean squared error between two tensors

  4. Compute the mean squared error between the last axis of tensor \(\mathbf{A}\) and vector \(\vec{b}\)

1.6.3. Broadcasting and Shapes

  1. Consider two vectors \(\vec{a}\) and \(\vec{b}\). Using reshaping and broadcasting alone, write python code to compute their outter product.

  2. Why is the code a.reshape((-1, 3, -1)) invalid?

  3. You have a tensor of unknown rank \(\mathbf{A}\) and would like to subtract 3.5, and 2.5 from every element so that your output, which is a new tensor \(\mathbf{B}\), is rank of \(\textrm{rank}(\mathbf{A}) + 1\). The last axis of \(\mathbf{B}\) should be dimension 2.