Need double slash \\[ \\] due to Markdown escaping reasons

author: Anthony Wang 2024-12-12 19:55:43 -0500
committer: Anthony Wang 2024-12-12 19:55:43 -0500
commit: ed3ab4624ec9b611fe1f634f8c17363217a311e6 (patch)
tree: 577774576452de194c591cf7e4b6fe336896c231
parent: 494af1d4f114a566a90e023d3322c5eb068505b3 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/content/posts/solving-shortest-paths-with-transformers.md b/content/posts/solving-shortest-paths-with-transformers.md
index 859f6e8..fdd5fc6 100644
--- a/content/posts/solving-shortest-paths-with-transformers.md
+++ b/content/posts/solving-shortest-paths-with-transformers.md
@@ -185,7 +185,7 @@ For our training run, we used the following specifications:
 | Optimizer               | Adam            |
 
 The number of bits required to store the model parameters in float32 is around $1.76\cdot10^6$. The number of possible graphs on 15 vertices generated using our procedure is approximately
-\[\frac{\binom{15}{2}^{15}}{15!} \approx 1.59\cdot10^{18}.\]
+\\[\frac{\binom{15}{2}^{15}}{15!} \approx 1.59\cdot10^{18}.\\]
 This is because there are $\binom{15}{2}$ choices for each of the 15 edges and we don't care about the order of the edges. This is only an approximation because some edges might be duplicated. Each graph has an answer between 1 and 15 which requires around 4 bits, so memorizing all the answers requires requires $4\cdot1.59\cdot10^{18} = 6.36\cdot10^{18}$ bits, which is $3.61\cdot10^{12}$ times larger than our model size. This implies that in order to get really low loss, our model needs to do something other than brute memorization.
 
 A single training run takes roughly three hours on a Radeon 7900 XTX graphics card.
author	Anthony Wang	2024-12-12 19:55:43 -0500
committer	Anthony Wang	2024-12-12 19:55:43 -0500
commit	ed3ab4624ec9b611fe1f634f8c17363217a311e6 (patch)
tree	577774576452de194c591cf7e4b6fe336896c231
parent	494af1d4f114a566a90e023d3322c5eb068505b3 (diff)