aboutsummaryrefslogtreecommitdiff
path: root/content/3_Silver/Unordered.mdx
blob: f530ad87ca754ac25ee1ed9e3e0bb34208189b21 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
id: unordered
title: Unordered Maps & Sets
author: Darren Yao, Benjamin Qi
description: "With hashing."
frequency: 2
prerequisites:
 - pairs-tuples
---

import { Problem } from "../models";

export const problems = {
    dis: [
      new Problem("CSES", "Distinct Numbers", "1621", "Easy", false, [], "Store every number in a set and print the size."),
    ],
    ex: [
      new Problem("YS", "Associative Array", "associative_array", "Easy")
    ],
    standard: [
      new Problem("CSES", "Sum of Two Values", "1640", "Easy", false, [], "Brute force one value by going through a[], then check if the other exists."),
      new Problem("Bronze", "Where Am I?", "964", "Easy", false, [], "Store all substrings in a map of <string, count>, and then find the longest length such that no substring of that length appears twice."),
      new Problem("Silver", "Cities & States", "667", "Hard", false, [], "Store two maps of counts for the first two letters of a city and state code, then iterate over the cities and use the maps to efficently query for the corresponding counts."),
    ],
};

<Resources>
  <Resource source="IUSACO" title="4.3 - Sets & Maps">module is based off this</Resource>
</Resources>

## What Are Sets and Maps?

A **set** is a collection of objects that contains no duplicates. A **map** is a set of ordered pairs, each containing a key and a value. In a map, all keys are required to be unique, but values can be repeated. Maps have three primary methods: one to add a specified key-value pairing, one to retrieve the value for a given key, and one to remove a key-value pairing from the map. Like sets, maps can be unordered or ordered.

Both Java and C++ contain two versions of sets and maps; one in which the keys are stored in sorted order, and one in which **hashing** is used. Bronze problems shouldn't distinguish between the two, so we'll cover only the latter in this module.

## Hashing

**Hashing** refers to assigning a unique code to every variable/object which allows insertions, deletions, and searches in $O(1)$ time, albeit with a high constant factor, as hashing requires a large constant number of operations. However, as the name implies, elements are not ordered in any meaningful way, so traversals of an unordered set will return elements in some arbitrary order. 

(more in-depth explanation?)

<IncompleteSection />

## Sets

<Problems problems={problems.dis} />

<LanguageSection>

<CPPSection>

The operations on an [`unordered_set`](http://www.cplusplus.com/reference/unordered_set/unordered_set/) are `insert`, which adds an element to the set if not already present, `erase`, which deletes an element if it exists, and `count`, which returns `1` if the set contains the element and `0` if it doesn't.

```cpp
unordered_set<int> s;
s.insert(1); // [1]
s.insert(4); // [1, 4] in arbitrary order
s.insert(2); // [1, 4, 2] in arbitrary order
s.insert(1); // [1, 4, 2] in arbitrary order
// the add method did nothing because 1 was already in the set
cout << s.count(1) << endl; // 1
set.erase(1); // [2, 4] in arbitrary order
cout << s.count(5) << endl; // 0
s.erase(0); // [2, 4] in arbitrary order
// if the element to be removed does not exist, nothing happens

for(int element : s){
    cout << element << " ";
}
cout << endl;
// You can iterate through an unordered set, but it will do so in arbitrary order
```

</CPPSection>

<JavaSection>

The operations on a `HashSet` are `add`, which adds an element to the set if not already present, `remove`, which deletes an element if it exists, and `contains`, which checks whether the set contains that element.

```java
HashSet<Integer> set = new HashSet<Integer>();
set.add(1); // [1]
set.add(4); // [1, 4] in arbitrary order
set.add(2); // [1, 4, 2] in arbitrary order
set.add(1); // [1, 4, 2] in arbitrary order
// the add method did nothing because 1 was already in the set
System.out.println(set.contains(1)); // true
set.remove(1); // [2, 4] in arbitrary order
System.out.println(set.contains(5)); // false
set.remove(0); // [2, 4] in arbitrary order
// if the element to be removed does not exist, nothing happens

for(int element : set){
	System.out.println(element);
}
// You can iterate through an unordered set, but it will do so in arbitrary order
```

</JavaSection>

</LanguageSection>

## Maps

<Problems problems={problems.ex} />

<LanguageSection>

<CPPSection>

In an [`unordered_map`](http://www.cplusplus.com/reference/unordered_map/unordered_map/) `m`, the `m[key] = value` operator assigns a value to a key and places the key and value pair into the map. The operator `m[key]` returns the value associated with the key. If the key is not present in the map, then `m[key]` is set to 0. The `count(key)` method returns the number of times the key is in the map (which is either one or zero), and therefore checks whether a key exists in the map. Lastly, `erase(key)` and `erase(it)` removes the map entry associated with the specified key or iterator. All of these operations are $O(1)$, but again, due to the hashing, this has a high constant factor.

```cpp
unordered_map<int, int> m;
m[1] = 5; // [(1, 5)]
m[3] = 14; // [(1, 5); (3, 14)]
m[2] = 7; // [(1, 5); (3, 14); (2, 7)]
m.erase(2); // [(1, 5); (3, 14)]
cout << m[1] << '\n'; // 5
cout << m.count(7) << '\n' ; // 0
cout << m.count(1) << '\n' ; // 1
```

</CPPSection>

<JavaSection>

In a `HashMap`, the `put(key, value)` method assigns a value to a key and places the key and value pair into the map. The `get(key)` method returns the value associated with the key. The `containsKey(key)` method checks whether a key exists in the map. Lastly, `remove(key)` removes the map entry associated with the specified key. All of these operations are $O(1)$, but again, due to the hashing, this has a high constant factor.

```java
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
map.put(1, 5); // [(1, 5)]
map.put(3, 14); // [(1, 5); (3, 14)]
map.put(2, 7); // [(1, 5); (3, 14); (2, 7)]
map.remove(2); // [(1, 5); (3, 14)]
System.out.println(map.get(1)); // 5
System.out.println(map.containsKey(7)); // false
System.out.println(map.containsKey(1)); // true
```

</JavaSection>

</LanguageSection>

(iterating over map?)

### Custom Hashing

There is no built in method for hashing pairs or vectors. Namely, `unordered_set<vector<int>>` does not work. In this case, we can use an [ordered map](../silver/intro-ordered) (which supports all of the functions used in the code above) or declare our own hash function.

<LanguageSection>

<CPPSection>

<Resources>
  <Resource source="Mark Nelson" title="Hash Functions for C++ Unordered Containers" url="https://marknelson.us/posts/2011/09/03/hash-functions-for-c-unordered-containers.html" starred>How to create user-defined hash function for `unordered_map`.</Resource>
</Resources>

The link provides an example of hashing pairs of strings. More examples (for pairs of ints)

```cpp
#include <bits/stdc++.h>
using namespace std;

typedef pair<int,int> pi;
#define f first
#define s second

struct hashPi {
	size_t operator()(const pi& p) const { return p.f^p.s; }
};

int main() {
	unordered_map<pi,int,hashPi> um;

}
```

```cpp
#include <bits/stdc++.h>
using namespace std;

typedef pair<int,int> pi;
#define f first
#define s second

namespace std {
	template<> struct hash<pi> {
		size_t operator()(const pi& p) const { return p.f^p.s; }
	};
}

int main() {
	unordered_map<pi,int> um;

}
```

However, this hash function is quite bad; if we insert $(0,0), (1,1), (2,2) \ldots$ then they will all be mapped to the same bucket (so it would easily be **hacked**).

</CPPSection>

</LanguageSection>

## Hacking

<Warning>

You don't need to know this for USACO, but you will need this to pass some of the problems in this module.

</Warning>

In USACO contests, unordered sets and maps generally fine, but the built-in hashing algorithm for C++ is vulnerable to pathological data sets causing abnormally slow runtimes. Apparently [Java](https://codeforces.com/blog/entry/62393?#comment-464875) is not vulnerable to this, however. 

<LanguageSection>

<CPPSection>

<Resources>
  <Resource title="Blowing up Unordered Map" source="CF" url="blog/entry/62393" starred>Explanation of this problem and how to fix it.</Resource>
</Resources>

Essentially use `unordered_map<int, int, custom_hash>` defined in the blog above in place of `unordered_map<int, int>`.

### Another Hash Function

<Resources>
  <Resource source="Benq (from KACTL)" title="HashMap" url="https://github.com/bqi343/USACO/blob/master/Implementations/content/data-structures/STL%20(5)/HashMap.h" starred> </Resource>
</Resources>

```cpp
struct chash { /// use most bits rather than just the lowest ones
	const uint64_t C = ll(2e18*PI)+71; // large odd number
	const int RANDOM = rng(); // random 32-bit number
	ll operator()(ll x) const { 
		// https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
		return __builtin_bswap64((x^RANDOM)*C); 
	}
};
template<class K,class V> using um = unordered_map<K,V,chash>;
```

(explain assumptions that are required for this to work)

</CPPSection>

</LanguageSection>

### `gp_hash_table`

Mentioned in several of the links above. See [Gold](../gold/faster-hashmap) for details.

## Problems

<Problems problems={problems.standard} />