-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
447 lines (261 loc) · 12.8 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
-*- outline -*-
* Infrastructure
** Continuous integration
*** OWRT build
- semi-automated full build/update system
- to update the timestamps within the feed/<package>/Makefile's
- facility for fully local build too
(rewrite paths to local ones)
requirement: always git commit, regardless of whether remote/local
** Automated testing on top of CI
*** NetKit+UML topologies + tests
Dream: 'one-command' setup+testing / automated setup+testing+teardown
* Not-exactly-our software - Bugs to fix
** luasocket - no way to use pktinfo etc
Unable to write reasonable any-bound UDP app due to that (it may respond
from different source address than it received packet in). Options:
- do what is portable == bind to specific addresses
- add pktinfo support to luasocket, use that (Linux-only perhaps(?))
** BIRD - doesn't handle too big LSAs gracefully
.. blows up due to rx-buf used for sending being smaller than what is being
stuffed down there.
Workaround: use 'rx buffer large' in config files.
** BIRD - OSPFv3
lsalib.c: 86 => what happens of time moves backwards?
nothing good happens if time moves much forwards, _either_. *sigh*
* HP/DNS - Bugs
** HP - sometimes takes long after boot for some legs mdns to work
I don't know yet why; possibly the re-send-LSA code is a culprit, we try to
resend it, it fails, and we resend it only after force resend interval
again (which is minute currently).
** HP - make sure the mdns cache is 'short' too
Due to lack of POOF etc, we should constrain the TTL in the MDNS cache;
also should never proactively maintain any cache records due to that..
* HP/DNS - Improvements
** DNS server - LLQ support
Currently as Apple allows for it only if and only if IPv4 global is
available (either directly, or via NAT-PMP), I don't personally need
it. However, it would be nice to have at some point.
** HP - profile why it's such CPU hog
*** get rid of double decode in inbound packet processing
*** make sure there isn't too many extra table copy or other operations
** HP - publish router addresses bit better
- on individual links, should publish link-specific link-local addresses
too (facilitates debugging if and when addressing is borked otherwise)
- on individual links, if have addresses of valid type, shouldn't
necessarily publish any _other_ addresses as they cause just extra load,
and we can assume stuff on homenet winds up being routed correctly anyway
** HP - Use IPv6 addresses for internal zone lookups
=> need to have them in OSPF too! (right now have only v4)
** HP - Announce only links on which mdns chatter is detected
** HP - Add multiple upstream router support to proxy(/elsewhere)
** HP - Add browse domain import feature
Caveats: DNS-SD browse domain list update occurs relatively rapidly in
clients (as the PTR we provide has sub-2min TTL), but the DHCP parameter
refresh will take awhile (I can't remember how long lease time we had, but
it's obviously much more than the PTR TTL). So I think the search path
update scheme in general is not that useful, but at least it's there.
If we ever find ourselves _really_ wanting use whole foreign DNS-SD zones
for non-demo purposes (for demo, we can probably just reset DHCP client if
it's necessary), we probably need to implement some sort of 'DNS-SD browse
domain list import' feature, in which the hybrid proxy will query browse
domain of those 'search' type foreign domains, and add them to the local
browse domain list (and always just present the local zone to the clients
as the DNS-SD zone to look for browse domain lists).
As it's additional complexity, and not obvious if it's desirable, it hasn't
been implemented yet.
Additionally, if someone else uses _our_ zone, there can be transitive
import which may not be desirable. So perhaps this is non-feature after
all?
* PA - Bugs to fix
** Make the search domain handling match the hybrid-and-ospf draft
Currently we transport search domain inside JSONBLOB. However, using the
draft-compatible zone definition would be probably more elegant.
** Ensure first-hop routability of our address at border to ISP
ping6 -r <default gateway-link-local>%interface <own global address from
delegated prefix>
(or equivalent) every now and then..
=~ poor man's BFD
** Get stuff going down/up times muuch shorter
- OSPF's 4x hello = dead = 40 secs isn't convincing(?)
** Should depracate addresses for which we don't have egress route
** BIRD - dridd p#2
Duplicate rid detected correctly even if router connected to ~self. But
problem is, it won't behave consistently (=form neigh's with other
routers), as the packets coming with different LL address using same RID on
same link confuse the other party.
* PA - Things to do
** 6204bis compliance
Check where we diverge from it, and check whether divergence is intentional
or accidental.
** Change every 'system state altering thing' to a shell script
Should really browse through pm_*.lua, but at least this:
- ip address changes
=> enables easy way of stopping address changes, or using system-specific
way to do it
** Add convenience mgmt layer to VM topology
- dedicate e.g. eth4 just for management traffic, over which we can use ssh
to play with any host in the topology, ignoring the real topology, it's
flapping, routing, whatever. obviously, routing protocols (and homenet
stuff) HAVE to ignore this interface.
** Try to identify memory leaks, if any
** Add prefix option support
Backward compatible approach: Define 'USP/ASP with options' TLVs. Use those
AND normal ones, until normal support no longer needed => remove normal
support altogether.
- Discuss with Arkko et al what form they should take
- add code to encode/decode new format ones (and do it in parallel with old
format)
(Current hack of options-in-jsonblob is bit iffy)
** update README's
.. few things not correct, I think
- add memory usage caveats
** Cleanup BIRD changes
- whitespace?
+ works with, without lua? with ipv6, without ipv6?
- should automate these tests
** Recursive routing
use ULA's for border nodes in the routing table on non-border nodes
=> less need to refresh the information
.. if not available, fall back to non-ULA lookup
.. each router can announce it's ULA in the infamous JSONBLOB
:--p
** On-link detection
(or smart bridging? could bridge if no routers detected on link..)
** Save assignments
** Code review the code
Comments to self
- use of 'class' as name for class name is bit misleading
- but 'name' might be too popular(?)
- accessors for getting class object, superclass objects would be nice
- could use some sort of ~singleton to save classname=>class,
class=>superclass relations?
- changed semantics so that no implicit copying happening almost anywhere
=> caller responsibility. only way to ensure high performance?
- added validity_sync utility class, which handles a collection's
validity. it has methods for both setting and clearing validity of
everything within (and it works with arrays, maps, multimaps)
- at some point could try to clean up the class inheritance mess -
especially tostring handling is worryingly opaque
- could try to optimize map:count, array:count, array:is with some play
around metatables (is_array flag, perhaps member count too?)
- how to deal with >255 nodes on IPv4 subnet?
TODO codereview
- ipv6s (bit too much spaghetti, perhaps?)
- ~l331 pa.lua+
** split mst.lua
** Comprehensively test PA alg (2/2)
- think more corner cases to test
- _large_ topologies should still converge (and they seem to; should set
up VMs to make sure, though, instead of just the stress test)
+ try to see if we can do again the DHCPv4 bug
+ some sort of mutating topology testcase? which runs long time?
.. make sure every now and then that the design assertions hold,
e.g. one IP address per interface per device, one prefix per link per USP
[ done, stress/elsa_pa_stress.lua ]
* Test topology - Bugs to fix
** Search domain not visible in V6
After adding the V6 name server support, it seems search domain is not
getting populated.
The answer is rdnssd; there is currently no _release_ version which has
RFC6106(?) support, it's just in git there. *sigh*
* Things to do if I have time (hah)
** Use netlink instead of 'ip' command to interact with IP addresses
Plan:
- Wrap a library (e.g. http://www.infradead.org/~tgr/libnl/) with SWIG
- Lua -> C provided by SWIG
- C -> Lua
- Write higher level abstraction library on top of that
.. probably couple of days of coding, all in all, and yet another package
dependency/even more bloated hnet-core (as we need to compile C
extensions). Therefore only 'nice-to-have', not mandatory, yet..
Unit testing using this needs to be considered too; the high level
abstraction API has to be simple, so it can be mocked without too much
pain, as in unit testing we trust..
** More effective handling of multimap memory usage
If just 1 entry, shouldn't use list, but instead just single entry.
** Dynamic border handling (for real)
.. actually configure firewall etc based on detection results.
- run OSPF on all interfaces [trivial, we do that now too]
- run PD on all interfaces [we do this now too, but could do it more
efficiently]
- only OSPF (/nothing) = interior
- only PD = exterior
- rest = mixed, but I'd prefer to treat them as exterior for firewalling
purposes
** Determine if two interfaces are bridged somehow
(two ports, same bridged network => need only one address/...?)
** Design own routing protocol? :---p
- OSPF insanely chatty
- assumption should be:
- link is ~constant
- IF no traffic whatsoever, make sure of bidir reachability (but can trust
e.g. RAs?)
- with traffic, routing protocol shouldn't need to chat _much_
- and chat should be just trickle-based 'life still same for you?
brilliant, mate.'
- subtree synchronization => e.g. using hash trees, with few (or many)
roundtrips depending on how large portions of tree are checked at once
- subtree updates can just send change deltas => no need for synchronization
.. these assumptions work for wired. how about wireless?
- _have_ to be more proactive about verifying people are still on link
- packet loss more of a concern?
** Optimize PA alg
*** Profile it (no premature optimizations)
*** Make sure computational complexity of all ops is minimal
pa.asp lookups - now use rid as key
- iid natural index - used in prefix assignment algorithm (run_if_usp)
! hmmh. iid nonunique though, and rid IS primary key now that I figured
that rid+iid is the unique key (took me a month but .. :-p) perhaps rid is
ok?
- prefix should be also index (binary prefix, probably) => conflict resolution
constant time (check_asp_conflicts)
- rid only relevant for finding own ASPs - we could store them separately
anyway? (get_local_asp_values)
- usp => asp needed - or some other way (find_assigned)
given #usp small, not necessary?
pa.usp lookups - now use rid as key
- containing prefix lookup cannot be optimized (or probably not worth it,
#usp is small)
*** PA alg - phase 2 - enable incremential mode
step 1: LSAs changed, IFs changed (? - mostly LSAs in a network should be
stable)
*** PA alg - phase 3 - enable even more incremential mode
step 2: individual LSA change notifications
(IF changes are hopefully rare enough..)
just get provided with LSA updates
** Document architecture better
- at least what moves, where
- which modules exist
- ...
** DNS with split horizon based on domains
- multihomed, may want to use e.g. cisco.com server X, and for rest Y
** Border detection - DHCPv6 PD integration
Somehow keep probing that DHCPv6 server is present on link.. that may
require luasocket w/ v6?
!!! fairly broken that we have to use SLAAC to get default route on
interface; also making sure it stays valid is awkward.. and should make
sure it's not one of the OSPF talking nodes somehow (...)
I suppose we can play with ND to make sure it stays there (and therefore
DHCPv6 solicit flooding isn't really neccessary?)
*** Which DHCPv6 PD to use?
- ISC has huge dependencies and is big
- dnsmasq too integrated with dnsmasq
<2k LoC
- udhcp in busybox seems bit broken
<2k LoC
.. just implement the state machine in SMC, make it a public reference,
** 6rd sunsetting
right now probably won't work correctly due to ignoring preferences; but
does anyone care :p
.. to do: write testcase which has 6rd+native with same assigned prefix,
and make sure result is sane
** Memory usage optimization
One LUA in system is probably acceptable memory hit; two+ might not be. If
so, have to consider how to slim the count to 1..
Currently ~2MB per process (pm.lua, bird6-elsa), which results in 4MB
memory overheard compared to system with 0 Lua in it.
** Try to get avahi fixed
It joins ipv6 linklocal multicast groups with non-linklocal addresses, and
also uses those as default source addresses when talking with other
hosts. That's just broken.