1. Tensors and Shapes¶
Tensors are the generalization of vectors (rank 1) and matrices (rank 2) to arbitrary rank. Rank can be defined as the number of indices required to get individual elements of a tensor. A matrix requires two indices (row, column), and is thus a rank 2 tensor. We may say in normal conversation that a matrix is a “two dimensional object” because it has rows and columns, but this is ambiguous because the row could be 6 dimensions and the columns could be 1 dimension. Always use the word rank to distinguish vectors, matrices, and higherorder tensors. The components that make up rank are called axes (plural of axis). The dimension is how many elements are in a particular axis. The shape of a tensor combines all of these. A shape is a tuple whose length is the rank and elements are the dimension of each axis.
Let’s practice our new vocabulary. A euclidian vector \((x, y, z)\) is a rank 1 tensor whose 0th axis is dimension 3. Its shape is \((3)\). Beautiful. A 5 row, 4 column matrix is now called a rank 2 tensor whose axes are dimension 5 and 4. Its shape is \((5, 4)\). The scalar (real number) 3.2 is a rank 0 tensor whose shape is \(()\).
TensorFlow has a nice visual guide to tensors.
Note
Array and tensor are synonyms. Array is the preferred word in numpy and often used when describing tensors in python. Tensor is the mathematic equivalent.
1.1. Einstein Notation¶
Einstein notation is the way tensor operations can be written out. We’ll be using a simplified version, based on the einsum
function available in many numerical libraries. It’s relatively simple. Each tensor is written as a lower case variable with explicit indices, like \(a_{ijk}\) for a rank 3 tensor. The reason the variable name is written in lower case is because if you fill in the indices \(a_{023}\), you get a scalar. A variable without an index, \(b\), is a scalar. There is one rule for this notation: if an index doesn’t appear on both sides, it is summed over on the one side in which it appears. Einstein notation requires both sides of the equation to be writtenout, so that its clear what the input/output shapes of the operation is.
Here are some examples of writing tensor operations in Einstein notation.
Total Sum
Sum all elements of a rank 4 tensor. In Einstein notation this is:
in normal mathematic notation, this would be
Sum Specific Axis
Sum over last axis
In normal notation:
Dot Product
In Einstein notation:
In normal notation:
Notice that \(a_i\) and \(b_i\) must have the same dimension in their 0th axis in order for the sum in the dot product to be valid. This makes sense, since to compute a dot product the vectors must be the same dimension. In general, if two tensors share the same index (\(b_{ij}\), \(a_{ik}\)), then that axis must be the same dimension.
Can you write the following out in Einstein notation?
Matrix Multiplication
The matrix product of 2 rank 2 tensors
Answer
Matrix Vector Product
Apply matrix \(a\) to column vector \(b\) by multiplication. \(\mathbf{A}\vec{b}\) in linear algebra notation.
Answer
Matrix Transpose
Swap the values in a matrix to make it a transpose
Answer
1.2. Tensor Operations¶
Although you can specify operations in Einstein notation, it is typically not expressive enough. How would you write this operation: sum the last axis of a tensor. Without knowing the rank, you do not know how many indices you should indicate in the expression. Maybe like this?
Well that’s good but what if your operation has two arguments: which axis to sum and the tensor. That would also be clumsy to write. Einstein notation is useful and we’ll see it more, but we need to think about tensor operations as analogies to functions. Tensor operations take in 1 or more tensors and output 1 or more tensors and the output shape depends on the input shape.
One of the difficult things about tensors is understanding how shape is treated in equations. For example, consider this equation:
Seems like a reasonable enough equation. But what if \(a\) is rank 3 and \(b\) is rank 1? Is \(g\) rank 1 or 3 then? Actually, this is a real example and the answer is that \(g\) is rank 4. You subtract each element of \(b\) from each element of \(a\). You could write this in Einstein notation:
except this function should work on arbitrary ranked \(a\) and always ouput \(g\) being the rank of \(a + 1\). Typically the best way to express this is explicitly stating how rank and shape are treated.
1.2.1. Reduction Operations¶
Reduction operations reduce the rank of an input tensor. sum(a, axis=0)
is an example. The axis argument means that we’re summing over the 0th axis so that it will be removed. If a
is a rank 1 vector, this would leave us with a scalar. If a
is a matrix, this would remove the rows so that only columns are left over. That means we would be left with column sums. You can also specify a tuple of axes to removed, which will be done in that order sum(a, axis=(0,1) )
.
In addition to sum
, there are min
, max
, any
(logical or), and more. Let’s see some examples
import numpy as np
a_shape = (4, 3, 2)
a_len = 4 * 3 * 2
a = np.arange(a_len).reshape(a_shape)
print(a.shape)
print(a)
(4, 3, 2)
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[22 23]]]
Try to guess the shape of the output tensors in the below code based on what you’ve learned.
b = np.sum(a, axis=0)
print(b.shape)
print(b)
(3, 2)
[[36 40]
[44 48]
[52 56]]
c = np.any(a > 4, axis=1)
print(c.shape)
print(c)
(4, 2)
[[False True]
[ True True]
[ True True]
[ True True]]
d = np.product(a, axis=(2,1))
print(d.shape)
print(d)
(4,)
[ 0 332640 8910720 72681840]
1.2.2. Element Operations¶
Default operations in python, like +

*
/
^
, are also tensor operations. They preserve shape so that the ouput shape is the same as the inputs’. The input tensors must have the same shape or be able to become the same shape through Broadcasting, which is defined in the next section.
a.shape
(4, 3, 2)
b = np.ones((4, 3, 2))
b.shape
(4, 3, 2)
c = a * b
c.shape
(4, 3, 2)
1.3. Broadcasting¶
One of the difficulties with the elementary operations is that they require the input tensors to have the same shape. For example, you cannot multiply a scalar (rank 0) and a vector (rank 1). This is where broadcasting comes in. Broadcasting increases the rank of one of the input tensors to be compatible with another. Broadcasting works at the last axis and works its way forward. Let’s see an example
Input A
Rank 2, shape is (2, 3)
A:
4 3 2
1 2 4
Input B
Rank 1, shape is (3), a vector:
B:
3
0
1
Now let’s see how the broadcasting works. Broadcasting starts by lining up the shapes from the end of the tensors
Step 1: align on last axis
tensor shape
A: 2 3
B: 3
broadcasted B: . .
Step 2: process last axis
Now broadcasting looks at the last axis (axis 1) and if one tensor has axis dimension 1, its value is copied to match the others. In our case, they agree.
tensor shape
A: 2 3
B: 3
broadcasted B: . 3
Step 3: process next axis
Now we examine the next axis, axis 0. B has no axis there, because its rank is too low. Broadcasting will insert a new axis by (i) inserting a new axis with dimension 1 and (ii) copying the value at this new axis until its dimension matches.
Step 3i:
Add new axis of dimension 1. This is like making \(B\) have 1 row and 3 columns:
B:
3 0 1
Step 3ii:
Now we copy the values of this axis until its dimension matches \(A\)’s axis 0 dimension. We’re basically copying \(b_{0j}\) to \(b_{1j}\).
B:
3 0 1
3 0 1
Final
tensor shape
A: 2 3
B: 3
broadcasted B: 2 3
Now, we compute the result by addition elementwise.
A + B
3 + 4 0 + 3 1 + 2 = 7 3 3
3  1 0 + 2 1 + 4 2 2 5
Let’s see some more examples, but only looking at the input/output shape
A Shape 
B Shape 
Output Shape 

(4,2) 
(4,1) 
(4,2) 
(4,2) 
(2,) 
(4,2) 
(16,1,3) 
(4,3) 
(16,4,3) 
(16,3,3) 
(4,1) 

Try some for yourself!
A Shape 
B Shape 
Output Shape 

(7,4,3) 
(1,) 
? 
(16, 16, 3) 
(3,) 
? 
(2,4,5) 
(5,4,1) 
? 
(1,4) 
(16,) 
? 
Answer
A Shape 
B Shape 
Output Shape 

(7,4,3) 
(1,) 
(7,4,3) 
(16, 16, 3) 
(3,) 
(16,16,3) 
(2,4,5) 
(5,4,1) 

(1,4) 
(16,) 

1.3.1. Suggested Reading for Broadcasting¶
You can read more about broadcastnig in the numpy tutorial or the Python Data Science Handbook.
1.4. Modifying Rank¶
The last example we saw brings up an interesting questions: what if we want to add a (1,4) and (16,) to end up with a (4,16) tensor? We could insert a new axis at the end of \(B\) to make its shape (16, 1). This can be done using the newaxis
sytnax:
a = np.ones((1,4))
b = np.ones(16,)
result = a + b[:, np.newaxis]
result.shape
(16, 4)
Just as newaxis can increase rank, we can decrease rank. One way is to just slice, like a[0]
. A more genaral way is to squeeze
which removes any axes that are dimension 1.
a = np.ones((1,32, 4, 1))
print('before squeese:', a.shape)
a = np.squeeze(a)
print('after squeese:', a.shape)
before squeese: (1, 32, 4, 1)
after squeese: (32, 4)
1.4.1. Reshaping¶
The most general way of changing rank and shape is through reshape
. This allows you to reshape a tensor, as long as the number of elements remains the same. You could make a (4, 2) into an (8,). You could make a (4, 3) into a (1, 4, 3, 1). Thus it can accomplish the two tasks done by squeeze
and newaxis
.
There is one special syntax element to shaping: 1
dimensions. 1
can appear once in a reshape command and means to have the computer figure out what goes there by following the rule that the number of elements in the tensor must remain the same. Let’s see some examples.
a = np.arange(32)
new_a = np.reshape(a, (4, 8))
new_a.shape
(4, 8)
new_a = np.reshape(a, (4, 1))
new_a.shape
(4, 8)
new_a = np.reshape(a, (1, 2, 2, 1))
new_a.shape
(1, 2, 2, 8)
1.4.2. Rank Slicing¶
I assume you’re familiar with slicing in numpy/python. I’ll refer you to the Python Tutorial and the numpy tutorial for a refresher if you need it. Rank Slicing is just my terminology for slicing without knowing the rank of a tensor. Use the ...
(ellipsis) keyword. This allows you to account for unknown rank when slicing. Examples:
Access last axis:
a[...,:]
Access last 2 axes:
a[...,:,:]
Add new axis to end
a[...,np.newaxis]
Add new axis to beginning
a[np.newaxis,...]
Let’s see if we can put together our skills to implement the equation example from above,
for arbitrary rank \(a\). Recall \(b\) is a rank 1 tensor and we want \(g\) to be the rank of \(a + 1\).
def eq(a, b):
return np.exp((a[...,np.newaxis]  b)**2)
b = np.ones(4)
a1 = np.ones( (4,3) )
a2 = np.ones( (4,3, 2, 1) )
g1 = eq(a1, b)
print('input a1:', a1.shape, 'output:', g1.shape)
g2 = eq(a2, b)
print('input a2:', a2.shape, 'output:', g2.shape)
input a1: (4, 3) output: (4, 3, 4)
input a2: (4, 3, 2, 1) output: (4, 3, 2, 1, 4)
1.5. Chapter Summary¶
Tensors are the building blocks of machine learrning. A tensor has a rank and shape that specifies how many elements it has and how they are arranged. An axis describes each element in the shape.
A euclidean vector is a rank 1 tensor with shape (3). It has 1 axis of dimension 3. A matrix is a rank 2 tensor. It has two axes.
Equations that describe operating on 1 or more tensors can be written using Enistein notation. Einstein notation uses indices to indicate the shape of tensors, how things are summed, and which axes must match up.
There are operations that reduce ranks of tensors, like sum or mean.
Broadcasting is an automatic tool in programming languages that modifies shapes of tensors make them compatible with operations.
Tensors can be reshaped or have ranked modified by
newaxis
,reshape
, andsqueeze
. These are not standard among the various numeric libraries in Python.
1.6. Exercises¶
1.6.1. Einstein notation¶
Write out the following in Einstein notation:
Product of two matrices
Trace of a matrix
Outter product of two Euclidean vectors
\(\mathbf{A}\) is a rank 3 tensor whose last axis is dimension 3 and contains Eulidean vectors. \(\mathbf{B}\) is Euclidean vector. Compute the dot product of each of the vectors in \(\mathbf{A}\) with B. So if \(\mathbf{A}\) is shape (11, 7, 3), it contains 11 \(\times\) 7 vectors and the output should be shape (11,7). \(\mathbf{B}\) is shape (3)
1.6.2. Reductions¶
Answer the following with Python code with reductions. Write your code to be as general as possible – being able to take arbitrary rank tensors unless it is specified that something is a vector.
Normalize a vector so that the sum of its elements is 1. Note the rank of the vector should be unchanged.
Normalize the last axis of a tensor
Compute the mean squared error between two tensors
Compute the mean squared error between the last axis of tensor \(\mathbf{A}\) and vector \(\vec{b}\)
1.6.3. Broadcasting and Shapes¶
Consider two vectors \(\vec{a}\) and \(\vec{b}\). Using reshaping and broadcasting alone, write python code to compute their outter product.
Why is the code
a.reshape((1, 3, 1))
invalid?You have a tensor of unknown rank \(\mathbf{A}\) and would like to subtract 3.5, and 2.5 from every element so that your output, which is a new tensor \(\mathbf{B}\), is rank of \(\textrm{rank}(\mathbf{A}) + 1\). The last axis of \(\mathbf{B}\) should be dimension 2.