"just run the algorithm 5000 times"

author: Anthony Wang 2025-05-05 00:24:18 -0400
committer: Anthony Wang 2025-05-05 00:24:18 -0400
commit: 62b9256407c8ea5fb6595591669523f89ea28e08 (patch)
tree: 29acab891dce311105ae426a38a97d7718bc1a82
parent: b49eeb1e204770a23844af32605478d53877ffb4 (diff)
3 files changed, 2 insertions, 2 deletions
diff --git a/README.md b/README.md
index 9f507d7..eab1eec 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ I trained the model on WikiText-103 for 20 epochs using an AMD Radeon 7900XTX. (
 
 ## Decoding
 
-For the no-breakpoint decoding, my algorithm alternates between running MCMC using bigram frequencies and running MCMC using the language model. Specifically, the language model reports for each character the distribution for which it should be replaced with, which I use to build a 28x28 matrix for how likely we should make each possible swap. My algorithm first uses the bigram frequencies to get an almost-correct answer, since the language model only works well when almost all the characters are correct. My program then repeats this 50 times and uses the answer with the lowest loss according to the language model.
+For the no-breakpoint decoding, my algorithm first runs MCMC using bigram and trigram frequencies and then switches to MCMC using the language model. The bigram and trigram step is necessary to compute an almost-correct answer, since the language model gets confused when many of the characters are wrong. The language model computes for each character the distribution for which it should be replaced with, which I use to build a 28x28 matrix for how likely we should make each possible swap. My program then repeats this at most 5000 times and uses the answer with the lowest loss according to the language model.
 
 To deal with breakpoints, my algorithm splits the input in half and decodes each half separately. If the first half was decoded correctly, it tries to decode the second half using the same permutation and uses the language model to detect when the text turns from coherent to gibberish. This narrows down the breakpoint location to within 20 characters, and then my algorithm brute-forces all those locations and picks the best one. Likewise, if the second half was decoded correctly, we repeat the same process. It's possible for both halves to decode correctly or almost correctly if the breakpoint is near the middle.
 
diff --git a/report.pdf b/report.pdf
index 3acedf9..694b0a6 100644
--- a/report.pdf
+++ b/report.pdf
diff --git a/src/main.rs b/src/main.rs
index fea7c1f..3b47cf9 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -313,7 +313,7 @@ fn decode(
     let mut lossbest = 100.;
     let mut rng = Xoshiro256PlusPlus::from_rng(&mut rand::rng());
     let mut refcnt = 0;
-    for _ in 0..10000 {
+    for _ in 0..5000 {
         let mut p = porig.clone();
         let mut lp = logprob(&s, &p, &cnts, &grams);
         for _ in 0..10000 {
author	Anthony Wang	2025-05-05 00:24:18 -0400
committer	Anthony Wang	2025-05-05 00:24:18 -0400
commit	62b9256407c8ea5fb6595591669523f89ea28e08 (patch)
tree	29acab891dce311105ae426a38a97d7718bc1a82
parent	b49eeb1e204770a23844af32605478d53877ffb4 (diff)