aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md12
1 files changed, 6 insertions, 6 deletions
diff --git a/README.md b/README.md
index f88de18..a7b45b6 100644
--- a/README.md
+++ b/README.md
@@ -17,9 +17,9 @@ humans cannot provide helpful labels. Deep learning models often perform remarka
We will use a synthetic task to test our hypothesis that models will generalize truthfully off-distribution. The synthetic task is computing the distance between various vertices in an input graph. Our experiment will have three parts:
-1. Pre-train a transformer to predict the distance between two fixed vertices $s,t$ on graphs with $n\in [16, 64]$ vertices.
-2. Fine-tune a transformer to predict the distances between $s,t'$ for any $t'$ which is on the shortest path from $s$ to $t$, but only do fine-tuning on graphs with $n\in [16,32]$ vertices.
-3. Test whether the transformer can accurately predict the distances between $s,t'$ for any $t'$ on the shortest path from $s$ to $t$ for graphs with $n\in [16,64]$ vertices.
+1. Pre-train a transformer to predict the distance between two fixed vertices $s,t$ on graphs with $n\in [8, 32)$ vertices.
+2. Fine-tune a transformer to predict the distances between $s,t'$ for any $t'$ which is on the shortest path from $s$ to $t$, but only do fine-tuning on graphs with $n\in [8,16)$ vertices.
+3. Test whether the transformer can accurately predict the distances between $s,t'$ for any $t'$ on the shortest path from $s$ to $t$ for graphs with $n\in [16,32)$ vertices.
## Data
@@ -31,9 +31,9 @@ The full input to our model will additionally add the target vertex after the pa
We have three separate datasets.
-- **Pre-train data**: For each $n \in [16,64]$, we will generate several graphs on $n$ vertices. We generate these graphs by inserting $2n$ random edges into the graph. We always set the target vertex to be $2$ here.
-- **Fine-tune data**: For each $n \in [16,32]$, we will generate several graphs on $n$ vertices. We generate these graphs by inserting $2n$ random edges into the graph. We select the target vertex to be a random vertex on the shortest path from $1$ to $2$.
-- **Generalization testing data**: The same as the fine-tune data, except we sample $n \in [32,64]$ instead.
+- **Pre-train data**: For each $n \in [8,32)$, we will generate several graphs on $n$ vertices. We generate these graphs by inserting $2n$ random edges into the graph. We always set the target vertex to be $2$ here.
+- **Fine-tune data**: For each $n \in [8,16)$, we will generate several graphs on $n$ vertices. We generate these graphs by inserting $2n$ random edges into the graph. We select the target vertex to be a random vertex on the shortest path from $1$ to $2$.
+- **Generalization testing data**: The same as the fine-tune data, except we sample $n \in [16,32)$ instead.
As a side note, we are also curious whether the transformer learns to generalize to different distributions of graphs, such as denser graphs or graphs with different properties. Time permitting, we will also investigate this.