• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Game-AI learner applied to mathematics: matrix multiplication

lpetrich

Contributor
Joined
Jul 27, 2000
Messages
25,114
Location
Eugene, OR
Gender
Male
Basic Beliefs
Atheist
The software that learned how to play chess and Go has now been adapted to doing a problem in mathematics: doing matrix multiplication efficiently, where a matrix is a rectangular array of numbers or number-like entities. For multiplying a m*n matrix and a n*p matrix, the usual algorithm requires m*n*p additions and multiplications. If one avoids the first addition for each new value, then the number of additions becomes m*(n-1)*p. Thus, for two 2*2 matrices, one needs only 4 additions and 8 multiplications. One can improve on that with the  Strassen algorithm - it involves doing some of the additions before the multiplications, and it needs 18 additions and 7 multiplications. If multiplication is much more computationally costly than addition, then this rearrangement can be worthwhile.

Breaking down one's multiplicand matrices into 2*2 submatrices, one can use Strassen's algorithm on them, complete with recursively doing it with each submatrix multiplication. The asymptotic performance is nlog(2,7) ~ n2.8074 instead of n3.

While 7 is the lower limit on the number of multiplications for 2*2 matrix multiplication, one can do better on larger matrices, and DeepMind has adapted AlphaZero to finding more efficient algorithms: AlphaTensor.

 
A matrix is a rectangular array of numbers or number-like entities, and one multiplies them by taking the inner product of each row of the first one and each column of the second one, forming a matrix of them.

For C = A * B we have

\( \displaystyle{ C_{ij} = \sum_k A_{ik} B_{kj} } \)

A simple way of doing the sum is to start from 0 and add each product to it. But one can shave off a bit of time by starting with the first product and adding the rest of the products to it.
Code:
for i from 1 to m
  for j from 1 to p
    sum = 0
    for k from 1 to n
      sum += A(i,k) * B(k,j)
    end for
    C(i,j) = sum
  end for
end for
and speeded up a little bit:
Code:
for i from 1 to m
  for j from 1 to p
    sum = A(i,1) * B(1,j)
    for k from 2 to n
      sum += A(i,k) * B(k,j)
    end for
    C(i,j) = sum
  end for
end for
 
How DeepMind did it:
First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves, corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so, this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps taken to zero out the tensor.
For instance, multiplying 4*5 and 5*5 with the simplest algorithm requires 100 multiplications, and an ingenious mathematician can get that down to 80, but AlphaTensor can get it down to 76.

For 11*12 and 12*12, the simplest algorithm requires 1584, and previous work got it down to 1022, but AlphaTensor got it down to 990.

AlphaTensor can also do optimizations for multiplying in finite field GF(2), 0 and 1 under addition and multiplication modulo 2. That has the properties that -1 = 1 and that 1 + 1 = 0.

There, for two mod-2 4*4 matrices, two iterations of Strassen's algorithm give 49 multiplications, while AlphaTensor found 47. For two mod-2 5*5 ones, AlphaZero went from 98 to 96.

Strassen's algorithm and AlphaTensor's improvements can in general be expressed symbolically as

C = W . ( (U.A) * (V.B) )

where A, B, and C are flattened to vectors, * is element-by-element multiplication, and U, V, and W are matrices to be found.

\( \displaystyle{ C_i = \sum_j W_{ij} \left(\sum_k U_{jk} A_k\right) \left(\sum_k V_{jk} B_k\right) } \)

From DeepMind's blog:
Algorithms in this rich space have different mathematical and practical properties. Leveraging this diversity, we adapted AlphaTensor to specifically find algorithms that are fast on a given hardware, such as Nvidia V100 GPU, and Google TPU v2. These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware, which showcases AlphaTensor’s flexibility in optimising arbitrary objectives.
AlphaTensor at GitHub
 
From Quanta Magazine,
When the DeepMind paper came out, Kauers and Moosbauer were in the process of searching for new multiplication algorithms using a conventional computer-aided search. But unlike most such searches, which start afresh with a new guiding principle, their method works by repeatedly tweaking an existing algorithm in hopes of squeezing more multiplication savings out of it. Taking AlphaTensor’s algorithm for 5-by-5 modulo 2 matrices as a starting point, they were surprised to find that their method reduced the number of multiplication steps from 96 to 95 after just a few seconds of computation.
noting
[2210.04045] The FBHHRBNRSSSHK-Algorithm for Multiplication in $\mathbb{Z}_2^{5\times5}$ is still not the end of the story by Manuel Kauers and Jakob Moosbauer

Thus beating AlphaTensor.
AlphaTensor also helped them make another improvement indirectly. Previously, Kauers and Moosbauer hadn’t bothered to explore the space of 4-by-4 matrices, believing that it would not be possible to beat two iterations of Strassen’s algorithm. The AlphaTensor result prompted them to reconsider, and after a week of computation time starting from scratch, their method turned up another 47-step algorithm unrelated to the one AlphaTensor had discovered. “If somebody had told us that there is something to discover for 4-by-4, we could have done that earlier,” said Kauers. “But OK, well, that’s how it works.”

Authors MK and JM note that for 3*3 matrices, one needs 23, as opposed to simple 27.

Asymptotic limits for n*n multiplication: simple n3, Strassen n2.87035, best so far n2.3728596 slightly beating the previous record of n2.3728639
 
So far, at best, AlphaTensor does only a little better than extrapolations of, Strassen's algorithm, and sometimes a little worse. Worse? For sizes that are not powers of 2, one gets exponents of the size a little greater than Strassen's.

For two (n*n) matrices, flattened A, B, and C have lengths n2, and the number of multiplications an upper limit of n3 and a typical value of around n2.8. That means that U, V, and W have upper limits on sizes of n5 and typical sizes of n4.8. Even if one stays within GF(2), one's search space gets very big very quickly: 2n^3 or 2n^(2.8)
 
For instance, multiplying 4*5 and 5*5 with the simplest algorithm requires 100 multiplications, and an ingenious mathematician can get that down to 80, but AlphaTensor can get it down to 76.
Nah, 4*5 is 20, and 5*5 is 25.
But look at multiplying the two. The first one has 4 rows with length 5, and the second 5 columns with length 5. Each row-column inner product requires 5 multiplications, and for every first-matrix row and second-matrix column, we get 4*5*5 = 100 multiplications.
 
This has potentially significant ramifications in many fields of computer simulation.

FEA &CFD are the two I'm most familiar with.
I skimmed your last few posts, but do these computational shortcuts produce inaccuracies?

Some calculations might be ok with some matgin of error, but some fields require precision, even if it's computationally expensive.
 
I skimmed your last few posts, but do these computational shortcuts produce inaccuracies?
This question has two parts.

First, as far as I can tell, these algorithms are exact. They can be guaranteed to be exact by choosing coefficients that are small integers, like -1, 0, and 1 -- all the coefficient values I've seen in the articles' examples.

Second, how much of a problem is roundoff error for them? These methods involve more additions than the simplest method, despite their fewer multiplications, so that might be a problem.
 
Back
Top Bottom