diff options
author | Anthony Wang | 2025-05-05 00:24:18 -0400 |
---|---|---|
committer | Anthony Wang | 2025-05-05 00:24:18 -0400 |
commit | 62b9256407c8ea5fb6595591669523f89ea28e08 (patch) | |
tree | 29acab891dce311105ae426a38a97d7718bc1a82 | |
parent | b49eeb1e204770a23844af32605478d53877ffb4 (diff) |
"just run the algorithm 5000 times"
-rw-r--r-- | README.md | 2 | ||||
-rw-r--r-- | report.pdf | bin | 133607 -> 136598 bytes | |||
-rw-r--r-- | src/main.rs | 2 |
3 files changed, 2 insertions, 2 deletions
@@ -35,7 +35,7 @@ I trained the model on WikiText-103 for 20 epochs using an AMD Radeon 7900XTX. ( ## Decoding -For the no-breakpoint decoding, my algorithm alternates between running MCMC using bigram frequencies and running MCMC using the language model. Specifically, the language model reports for each character the distribution for which it should be replaced with, which I use to build a 28x28 matrix for how likely we should make each possible swap. My algorithm first uses the bigram frequencies to get an almost-correct answer, since the language model only works well when almost all the characters are correct. My program then repeats this 50 times and uses the answer with the lowest loss according to the language model. +For the no-breakpoint decoding, my algorithm first runs MCMC using bigram and trigram frequencies and then switches to MCMC using the language model. The bigram and trigram step is necessary to compute an almost-correct answer, since the language model gets confused when many of the characters are wrong. The language model computes for each character the distribution for which it should be replaced with, which I use to build a 28x28 matrix for how likely we should make each possible swap. My program then repeats this at most 5000 times and uses the answer with the lowest loss according to the language model. To deal with breakpoints, my algorithm splits the input in half and decodes each half separately. If the first half was decoded correctly, it tries to decode the second half using the same permutation and uses the language model to detect when the text turns from coherent to gibberish. This narrows down the breakpoint location to within 20 characters, and then my algorithm brute-forces all those locations and picks the best one. Likewise, if the second half was decoded correctly, we repeat the same process. It's possible for both halves to decode correctly or almost correctly if the breakpoint is near the middle. diff --git a/src/main.rs b/src/main.rs index fea7c1f..3b47cf9 100644 --- a/src/main.rs +++ b/src/main.rs @@ -313,7 +313,7 @@ fn decode( let mut lossbest = 100.; let mut rng = Xoshiro256PlusPlus::from_rng(&mut rand::rng()); let mut refcnt = 0; - for _ in 0..10000 { + for _ in 0..5000 { let mut p = porig.clone(); let mut lp = logprob(&s, &p, &cnts, &grams); for _ in 0..10000 { |