Game-AI learner applied to mathematics: matrix multiplication

lpetrich · Dec 2, 2022

The software that learned how to play chess and Go has now been adapted to doing a problem in mathematics: doing matrix multiplication efficiently, where a matrix is a rectangular array of numbers or number-like entities. For multiplying a m*n matrix and a n*p matrix, the usual algorithm requires m*n*p additions and multiplications. If one avoids the first addition for each new value, then the number of additions becomes m*(n-1)*p. Thus, for two 2*2 matrices, one needs only 4 additions and 8 multiplications. One can improve on that with the

Strassen algorithm - it involves doing some of the additions before the multiplications, and it needs 18 additions and 7 multiplications. If multiplication is much more computationally costly than addition, then this rearrangement can be worthwhile.

Breaking down one's multiplicand matrices into 2*2 submatrices, one can use Strassen's algorithm on them, complete with recursively doing it with each submatrix multiplication. The asymptotic performance is n^log(2,7) ~ n^2.8074 instead of n³.

While 7 is the lower limit on the number of multiplications for 2*2 matrix multiplication, one can do better on larger matrices, and DeepMind has adapted AlphaZero to finding more efficient algorithms: AlphaTensor.

Discovering novel algorithms with AlphaTensor - DeepMind itself - "First extension of AlphaZero to mathematics unlocks new possibilities for research"
Discovering faster matrix multiplication algorithms with reinforcement learning | Nature
Artificial intelligence finds faster algorithms for multiplying matrices
AlphaTensor: DeepMind's AI For Discovering Efficient Matrix Multiplication Algorithms – Weights & Biases - "DeepMind has revealed AlphaTensor, an extension of AlphaZero which searches for more efficient matrix multiplication algorithms by treating the search process like a game."
How DeepMind's AlphaTensor AI Devised a Faster Matrix Multiplication - The New Stack - "The DeepMind team approached the matrix multiplication problem like a game, with AlphaTensor building upon its game-playing predecessor, AlphaZero."
AI Reveals New Possibilities in Matrix Multiplication | Quanta Magazine - "Inspired by the results of a game-playing neural network, mathematicians have been making unexpected advances on an age-old math problem."

lpetrich · Dec 2, 2022

A matrix is a rectangular array of numbers or number-like entities, and one multiplies them by taking the inner product of each row of the first one and each column of the second one, forming a matrix of them.

For C = A * B we have

$ \displaystyle{ C_{ij} = \sum_k A_{ik} B_{kj} } $

A simple way of doing the sum is to start from 0 and add each product to it. But one can shave off a bit of time by starting with the first product and adding the rest of the products to it.

Code:

for i from 1 to m
  for j from 1 to p
    sum = 0
    for k from 1 to n
      sum += A(i,k) * B(k,j)
    end for
    C(i,j) = sum
  end for
end for

and speeded up a little bit:

Code:

for i from 1 to m
  for j from 1 to p
    sum = A(i,1) * B(1,j)
    for k from 2 to n
      sum += A(i,k) * B(k,j)
    end for
    C(i,j) = sum
  end for
end for

lpetrich · Dec 2, 2022

How DeepMind did it:

First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves, corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so, this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps taken to zero out the tensor.

For instance, multiplying 4*5 and 5*5 with the simplest algorithm requires 100 multiplications, and an ingenious mathematician can get that down to 80, but AlphaTensor can get it down to 76.

For 11*12 and 12*12, the simplest algorithm requires 1584, and previous work got it down to 1022, but AlphaTensor got it down to 990.

AlphaTensor can also do optimizations for multiplying in finite field GF(2), 0 and 1 under addition and multiplication modulo 2. That has the properties that -1 = 1 and that 1 + 1 = 0.

There, for two mod-2 4*4 matrices, two iterations of Strassen's algorithm give 49 multiplications, while AlphaTensor found 47. For two mod-2 5*5 ones, AlphaZero went from 98 to 96.

Strassen's algorithm and AlphaTensor's improvements can in general be expressed symbolically as

C = W . ( (U.A) * (V.B) )

where A, B, and C are flattened to vectors, * is element-by-element multiplication, and U, V, and W are matrices to be found.

$ \displaystyle{ C_i = \sum_j W_{ij} \left(\sum_k U_{jk} A_k\right) \left(\sum_k V_{jk} B_k\right) } $

From DeepMind's blog:

Algorithms in this rich space have different mathematical and practical properties. Leveraging this diversity, we adapted AlphaTensor to specifically find algorithms that are fast on a given hardware, such as Nvidia V100 GPU, and Google TPU v2. These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware, which showcases AlphaTensor’s flexibility in optimising arbitrary objectives.

AlphaTensor at GitHub

lpetrich · Dec 2, 2022

From Quanta Magazine,

When the DeepMind paper came out, Kauers and Moosbauer were in the process of searching for new multiplication algorithms using a conventional computer-aided search. But unlike most such searches, which start afresh with a new guiding principle, their method works by repeatedly tweaking an existing algorithm in hopes of squeezing more multiplication savings out of it. Taking AlphaTensor’s algorithm for 5-by-5 modulo 2 matrices as a starting point, they were surprised to find that their method reduced the number of multiplication steps from 96 to 95 after just a few seconds of computation.

noting
[2210.04045] The FBHHRBNRSSSHK-Algorithm for Multiplication in $\mathbb{Z}_2^{5\times5}$ is still not the end of the story by Manuel Kauers and Jakob Moosbauer

Thus beating AlphaTensor.

AlphaTensor also helped them make another improvement indirectly. Previously, Kauers and Moosbauer hadn’t bothered to explore the space of 4-by-4 matrices, believing that it would not be possible to beat two iterations of Strassen’s algorithm. The AlphaTensor result prompted them to reconsider, and after a week of computation time starting from scratch, their method turned up another 47-step algorithm unrelated to the one AlphaTensor had discovered. “If somebody had told us that there is something to discover for 4-by-4, we could have done that earlier,” said Kauers. “But OK, well, that’s how it works.”

Authors MK and JM note that for 3*3 matrices, one needs 23, as opposed to simple 27.

Asymptotic limits for n*n multiplication: simple n³, Strassen n^2.87035, best so far n^2.3728596 slightly beating the previous record of n^2.3728639

lpetrich · Dec 2, 2022

So far, at best, AlphaTensor does only a little better than extrapolations of, Strassen's algorithm, and sometimes a little worse. Worse? For sizes that are not powers of 2, one gets exponents of the size a little greater than Strassen's.

For two (n*n) matrices, flattened A, B, and C have lengths n², and the number of multiplications an upper limit of n³ and a typical value of around n^2.8. That means that U, V, and W have upper limits on sizes of n⁵ and typical sizes of n^4.8. Even if one stays within GF(2), one's search space gets very big very quickly: 2^{n^3} or 2^{n^(2.8)}

bilby · Dec 2, 2022

lpetrich said:
For instance, multiplying 4*5 and 5*5 with the simplest algorithm requires 100 multiplications, and an ingenious mathematician can get that down to 80, but AlphaTensor can get it down to 76.

Nah, 4*5 is 20, and 5*5 is 25.

You're welcome.

lpetrich · Dec 2, 2022

bilby said:
lpetrich said:

For instance, multiplying 4*5 and 5*5 with the simplest algorithm requires 100 multiplications, and an ingenious mathematician can get that down to 80, but AlphaTensor can get it down to 76.

Click to expand...

Nah, 4*5 is 20, and 5*5 is 25.

But look at multiplying the two. The first one has 4 rows with length 5, and the second 5 columns with length 5. Each row-column inner product requires 5 multiplications, and for every first-matrix row and second-matrix column, we get 4*5*5 = 100 multiplications.

Worldtraveller · Dec 5, 2022

This has potentially significant ramifications in many fields of computer simulation.

FEA &CFD are the two I'm most familiar with.
I skimmed your last few posts, but do these computational shortcuts produce inaccuracies?

Some calculations might be ok with some matgin of error, but some fields require precision, even if it's computationally expensive.

lpetrich · Dec 5, 2022

Worldtraveller said:
I skimmed your last few posts, but do these computational shortcuts produce inaccuracies?

This question has two parts.

First, as far as I can tell, these algorithms are exact. They can be guaranteed to be exact by choosing coefficients that are small integers, like -1, 0, and 1 -- all the coefficient values I've seen in the articles' examples.

Second, how much of a problem is roundoff error for them? These methods involve more additions than the simplest method, despite their fewer multiplications, so that might be a problem.

Game-AI learner applied to mathematics: matrix multiplication

Loading....

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

lpetrich

Contributor

bilby

Fair dinkum thinkum

lpetrich

Contributor

Worldtraveller

Veteran Member

lpetrich

Contributor