From 75a6921af214a4d7157627524d916a5bda7d1406 Mon Sep 17 00:00:00 2001 From: SIPB Date: Tue, 10 Dec 2024 23:00:13 -0500 Subject: add --- blog.md | 4 ++-- index.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/blog.md b/blog.md index 03ec25f..0e372b8 100644 --- a/blog.md +++ b/blog.md @@ -157,9 +157,9 @@ Finally, the last head will be in charge of noticing whether vertex 1 has reache The field of Singular Learning Theory (SLT; see Liam Carroll's Master's thesis "Phase Transitions in Neural Networks" for an introduction) aims to understand model training and loss-landscape geometry. In efforts to better understand the loss landscape of the shortest paths loss function according to the tokens used in our hand coded implementation of the shortest paths transformers, we decided to start at a good setting of the parameters, and then perturb the weights, and see if the model can subsequently achieve low loss. The intuition for why this is a good approach at measuring "how attractive of a loss basin" we have is that this experiment is similar to the Local Learning Coefficient from SLT. (see Lau, Edmund, Zach Furman, George Wang, Daniel Murfet, and Susan Wei. "The Local Learning Coefficient: A Singularity-Aware Complexity Measure"). We found that, perturbing the weights led to high loss, but gradient descent was able to recover low loss, indicating that the solution is somewhat "findable" by gradient descent.
-![perturb.png](perturb.png) +![perturb.png](img/perturb.png) -![perturb-loss.png](perturb-loss.png) +![perturb-loss.png](img/perturb-loss.png)
## Training diff --git a/index.html b/index.html index c3c4108..e6e6418 100644 --- a/index.html +++ b/index.html @@ -601,11 +601,11 @@ loss, indicating that the solution is somewhat “findable” by gradient descent.

-perturb.png +perturb.png
-perturb-loss.png +perturb-loss.png
-- cgit v1.2.3-70-g09d2