aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAnthony Wang2024-12-12 19:55:43 -0500
committerAnthony Wang2024-12-12 19:55:43 -0500
commited3ab4624ec9b611fe1f634f8c17363217a311e6 (patch)
tree577774576452de194c591cf7e4b6fe336896c231
parent494af1d4f114a566a90e023d3322c5eb068505b3 (diff)
Need double slash \\[ \\] due to Markdown escaping reasons
-rw-r--r--content/posts/solving-shortest-paths-with-transformers.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/content/posts/solving-shortest-paths-with-transformers.md b/content/posts/solving-shortest-paths-with-transformers.md
index 859f6e8..fdd5fc6 100644
--- a/content/posts/solving-shortest-paths-with-transformers.md
+++ b/content/posts/solving-shortest-paths-with-transformers.md
@@ -185,7 +185,7 @@ For our training run, we used the following specifications:
| Optimizer | Adam |
The number of bits required to store the model parameters in float32 is around $1.76\cdot10^6$. The number of possible graphs on 15 vertices generated using our procedure is approximately
-\[\frac{\binom{15}{2}^{15}}{15!} \approx 1.59\cdot10^{18}.\]
+\\[\frac{\binom{15}{2}^{15}}{15!} \approx 1.59\cdot10^{18}.\\]
This is because there are $\binom{15}{2}$ choices for each of the 15 edges and we don't care about the order of the edges. This is only an approximation because some edges might be duplicated. Each graph has an answer between 1 and 15 which requires around 4 bits, so memorizing all the answers requires requires $4\cdot1.59\cdot10^{18} = 6.36\cdot10^{18}$ bits, which is $3.61\cdot10^{12}$ times larger than our model size. This implies that in order to get really low loss, our model needs to do something other than brute memorization.
A single training run takes roughly three hours on a Radeon 7900 XTX graphics card.