-
Notifications
You must be signed in to change notification settings - Fork 1
/
120103_001.MP3.txt
1411 lines (1411 loc) · 126 KB
/
120103_001.MP3.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
The Blue Pill. Self-improving AI. Self-improving AI is a meme that has been circulating since
the 1980s. Current proponents of the idea include Wastram and Omihandro. My own summary goes
something like this. If we get any kind of AGI going, no matter how slow it is and how
buggy it is, we can give it access to its own source code and let it analyze it and
clean up and fix the bugs and then rewrite its code to be as good as it can make it.
We then start up the slightly smarter AGI and repeat the process until the AGI's get
super intelligent. On the surface, this is irrefutable. We already have examples of systems
improving themselves. We can buy a cheap 3D printer and then quite cheaply print out parts
for a much better 3D printer. Or to make computer chips that go into computers that design better
computer chips. Not to mention evolution of all species in nature. I look at it from an
epistemologist point of view and say, that's a hard line reductionist idea that should
not have made it out of the 20th century. The idea, as its inception, imagined an AGI
as something that was written by teams of human programmers using software development
tools and mathematical equations. What I think the only thing that even approximates this
outcome is that the code is perfect, and humans as well as machines all agree there are no
more improvements to be made. And the resulting AGI's are still not super intelligent. The
most likely outcome is that we all realize the folly in this argument and won't even
try. It's not about the code. The number of lines of code in AI related projects has been
declining rapidly. 2012. 34,000 lines.py.kudukrzebski et al. for ImageNet. 2013. 1571 lines of
Lua to Play Atari games. 2017. 196 lines of Keras to Implement Deep Dream. 2018. Less
than 100 lines of Keras for research paper-level results. And all of these, except Saig, included
as the most famous example of a 20th century reductionist AI system, demonstrates new levels
of power of machine learning. The limits to intelligence are not in the code. In fact,
they are not even technological. The limit of intelligence is the complexity of the
world. Admission is unavailable. The main purpose of intelligence is to guess, to jump
to conclusions on scant evidence, and to do it well, based on a large set of historical
patterns of problems and their solutions or events and their consequences. Because scant
evidence is all we will ever have, we don't even know what goes on behind our back. And
because our intelligence is guessing, I have repeatedly claimed that, all intelligences
are fallible. We are already making machines that are better than humans in some aspect
of guessing. Protein folding and playing go are examples of this. And these machines
will get bigger and better at what they do and will be superhuman in various ways and
in many problem domains, simply based on larger capacity to hold, look up, or search useful
patterns. The code doing that can be hand optimized to the point where any AI improvement
would be insignificant. My own code in the inner loop for understanding any language
on the planet, once it has learned it, in inference mode is about 90 lines of Java.
We can expect a best minor improvements to efficiency and speed. It comes down to the
corpus. In my domain, NLU, simple tests can be scored at 100% after a few minutes of learning
on a laptop. Continue learning for days and weeks would provide a larger sample set of
vocabulary in appropriate contexts, which would mainly correct misunderstandings in
corner cases. But these corporal are not comparable by several orders of magnitude, to the gathered
life experience of a human at age 25. The main limit of intelligence is corpus size
in ML situation. Future artificial intelligences will be nothing like what AGI fans have been
fearmongering about. These are 20th century reductionist AI ideas. The components are
blind to the most fundamental basics of epistemology. Reductionist good old fashioned AI has been
demonstrated to being inferior in their own domains to even semi-trivial machine learning
methods. We need AGL, not AGI. Machines learning to code. As of this writing, there are a handful
of available code writing systems based on ML technology that has learned from large
quantities of open source code. For example GitHub Copilot, OpenAI Codex, and Amazon Code
Whisperer. They have not yet surpassed human programmers. But it's not about writing code
either. AI's writing code is about as silly as AI magazine covers with pictures of robots
typing, wink wink. In the future, if we want the computer to do something, we will have
a conversation, speaking and listening, with the computer. The conversation is at the level
of discussing a problem with a competent coworker or professional. It may spontaneously ask
clarifying questions. I call this, contiguously rolling topic, mixed initiative dialogue, others
talk of these bots as dialogue Asians. But this will go beyond Siri or Alexa, and when
the computer understands exactly what you want done. It just does it. Why would reductionist
style programming be a necessary step? Yes, there will still be lots of places where we
want to use code. But whether that code is written by humans or AI's will make much
less of a difference than we might expect based on today's use of computers.
The Pink Pill. The Wisdom Salon. Wisdom Salon is an online world cafe. The World Cafe protocol
is a recipe for organizing conversations that matter on a large scale. Thousands of people
can cooperate in order to bring clarity to complex issues. This is a post-mortem summary
for my interrupted wisdom salon project. I have all the code in an archive, but it requires
a complete rewrite in order to fix the two biggest problems. The switch from flash,
hack, to HTML5 for video and the cost of video connections. I know how to fix these but I'm
busy working on understanding machines. At the moment, I am looking for someone to take
this over. I also observe that there is a need for something like this. I see things discussed
on Quora that would make good topics for a wisdom salon. I happen to believe video in
spoken words are an important component for many reasons.
Wisdom. Knowledge and information can easily be found on the web. But what about wisdom?
Intelligence is based on gathered knowledge. Wisdom is based on gathered experience. To
get wiser, seek out more experiences. Engage yourself. Do more stuff. Travel. Talk to people
to share their experiences. Conversation with others is the easiest way to gain wisdom.
But not all conversations are equal. We want conversations that matter. The World Café
Protocol. The World Café Protocol is a recipe for organizing such conversations that matter
on a large scale. Thousands of people can cooperate in order to bring clarity to complex
issues. To find out more, buy the book or study the World Café website. But this is
how it typically works. In some conference facilities or gymnasium, the organizers provide
dozens to hundreds of square tables. Each has four chairs, a box of crayons, and a piece
of butcher paper as a tablecloth. Stakeholders from all walks of life get invited and sit
down at the tables. This could be a mixture of farmers, teachers, politicians, in corporate
environments. Sometimes this is everybody in the company. Organizers now unveil a carefully
phrased focusing question as the topic of the conversations. It is important that the
question is positive and focusing. For education reform, don't ask, what is wrong with our
education system? Instead, ask, what could a great school also be? The four people at
each table now start a conversation around the question. Everyone takes notes on the
butcher paper, using the crayons. After 20 minutes, a gong rings. Three people. Everyone
except south in duplicate bridge terms. At each table get up and move to other tables
at random. Through fresh random people sit down at each table. South now first explains
to the newcomers what the notes on the tablecloth mean. This provides a kind of lightweight
continuity from the previous conversation at this table. The three newcomers comment
on these notes and add fresh comments. The best parts of what was said at their previous
tables. These conversations unfold very naturally. Four strangers can easily have a friendly
conversation about complex things that matter. They don't even have to introduce themselves.
They contribute their wisdom and experiences. Not their resumes. Conversations now continue
for another 20 minutes. The gong rings again, and the shuffling repeats. After two to three
hours, the session is over and the butcher papers are gathered by the organizers into
what is called the harvest. They are summarized in some time later. Perhaps, after lunch,
the results are shared with all the stakeholders. Why this works so well? Someone pushing a
bad idea of theirs at every table can spam at worst 27 people in three hours. A good
idea. Introduced at the first table and repeated by all participants at subsequent tables will
reach over 100,000 people or the majority of the audience, whichever is smaller. This
is the filtering power of the World Café protocol. Wisdom Salon is an online World
Café. Sadly, the Wisdom Salon project has been suspended because of changing infrastructure
and cost structure for online video transmissions, and because of lack of time on my part. It
is possible to restart the project using current video technology and with funding and a larger
team. If interested in contributing to this, please get in touch. What follows is the original
high-level design specification, written in the present tense, design specification.
The Wisdom Salon is a 24-7 online World Café implemented as a video chat site. Conversations
have four participants, but each conversation can also have a passive and quiet audience
of any size. All conversations are always public. All conversation participants are
known by their login identities. Why would anyone want to participate? The main purpose
of Wisdom Salon is increased wisdom and improved clarity and complex issues for the participants.
This is your main benefit. This is why you would want to participate. You will not get
lags, but you might earn a local currency, called, Influence, that you can selectively
use to extend your influence.
Goal. The goal is specifically not to find the best grains of wisdom in the harvest.
The grains are there mainly to provide continuity and shorten the time to get to talking about
things that matter. The system is there to provide the users a chance to analyze large
and complex issues with others in conversation and in exchange of experiences. Do not underestimate
how different an interactive conversation is from a web search or reading a book. Have
you ever spent days studying something without getting it only to have someone set you straight
in two minutes of conversation? Have you ever been in a meeting where the resolution is
something none of the participants even understood when the meeting started?
Sample questions. What kinds of questions demonstrate the power of the Wisdom Salon?
Consider these samples. I am considering a midlife career change. What matters? Where
should I retire, and why there? Should I pursue a career in engineering or medicine?
Lifestyle design in interesting times. What is the true promise of genetics research and
why should I care? What movies should I let my children watch, and why?
Musical education for my child. What matters? What instruments, and why? What is it really
like to be a soldier in places like Afghanistan and Iraq? Should I retire in Costa Rica? User
experience. People arrive when they want and leave when they want. They can engage in multiple
ways. Upon entering the site, users are presented with the, at the moment, most popular conversation,
the one with the largest audience. Below the conversation, there will be a list of other
popular conversations, headed by conversations and topics the user may have watched or previously
participated in. They can browse all ongoing conversations much like watching talk shows
on television. They can select from hundreds of questions to find something that interests
them, or add their own. Instead of a butcher paper, they can leave notes on each question
known as, grains of wisdom, to provide the lightweight continuity from table to table.
They can vote on these grains of wisdom so that they better result rise to the top. Results
are immediately visible to all. They can observe what other people say and how they behave
and modify their own social graph to improve their chances of interaction with the best
people. A local currency is earned by passive engagement per hour, more of it is earned
by participating in conversations, and the currency is used to pay for the privilege
of posting a comment, because posting cost currency, spelling the grains of wisdom will
be limited. A topic without currently active conversations still allows you to browse the
grains of wisdom on the topic, and if you have influence, you can vote on the grains or notes
that you like or otherwise agree with, and you can restart the topic by creating a table
and hope others will join. Four main uses of wisdom salon. The site enables, but doesn't
enforce the World Cafe protocol. You can use the site for several different purposes.
As entertainment and education, passively watching conversations among your peers,
much like flipping channels on television. To get both factual information and broad
ranging personalized advice from experts. To share your expertise in fields you understand.
To do micromantering. To find an audience for storytelling and sharing personal experiences
from your life. To gain wisdom and personal clarity in complex issues. To debate the major
issues of the day in person and productively selected and well behaved groups. To find
new interesting and competent friends by observing their behavior and then befriending them,
much like other social media. Any active conversation starts a 20 minute clock bar moving. You
can leave anytime. System provides some incentive to stay the full 20 minutes. On the other
hand, you don't have to leave after 20 minutes. If you like, you can continue conversation
along as you want. But we expect a large fraction of people to adhere to the protocol. We believe
this maximizes the wisdom gain per session. Without the right people, the system is worthless.
Do not be discouraged. Facebook would be worthless with only 10 people on it. Wisdom salon really
requires at least 50 people to be on the system before you are likely to find a conversation
around a question you actually care about anytime you join. So nobody knows if this
will work or not, and it may take a while before the system matures enough to attract
a sufficient repeat audience to become what I designed it for. If you don't like it
at first, please try again. It might well improve, and you might get lucky to get into
an amazing conversation when you least expect it. Welcome to my experiment.
The lavender pill. Model free AI. Don't model the world. Just model the mind. It's a lot
easier. With some poetic freedom, I'd like to claim 1. Model the world. 10 billion lines
of code. 2. Model the brain. 10 million lines of code. 3. Model the mind. 10,000 lines of
code. Number one is regular programming. We make computers perform actions in a context
that matches the programmer's mental model of some relevant parts of the world. Number
two is neuroscience-based models of neurons, synapses and other biological structures and
systems in brains. The number three is epistemology-based models of learning, understanding, reasoning,
prediction, abstraction, and other holistic and emergent phenomena. Epistemology-based
methods require a rather minimal infrastructure to support whatever operations these concepts
require. I put models within irony quotes because they are strictly speaking metamodels
because they are used in metascales. They are not about skills, such as English or folding
proteins. They are about how to acquire such skills by learning from our mistakes.
The purple pill. Corpus congruence. Understanding in brains and machines can be defined and
measured as corpus congruence. Corpus congruence as a metric spans up almost all of NLP. Understanding
in brains and machines can be defined and measured as corpus congruence. Let's consider
this in the machine learning sense. If a machine is model-free, holistic, as all general understanders
have to be in order to not get trapped into a limited model, then all it ever knows comes
from the corpus it was trained on. And all it really can say is, this is more like my
corpus than that. Or, this is more like these documents in my corpus than those corpus congruence
as a metric spans up almost all of NLP. Because most of NLP is doxen in various guises. Given
two documents A and B in some corpus, a classifier can say that an unknown document, which we
can call U, is more like it than B given this capability we can build. Classification and
clustering by using A, B, up to N as defining classes. Filtering by using A, wanted dox
and B, unwanted dox. Summoned analysis by using A, negative dox and B, positive dox.
Entity extraction by softly matching termed against lists of known entities. Doxen, find
me more documents like this one. Reductionist and NLP uses all of these at the bag of words
or word count levels for things like web search, span filtering, and clustering. Holistic
NLU aims to do the same based on the meanings expressed in sentences and paragraphs. But
semantic corpus congruence is still corpus congruence. Common sense now becomes, is the
proposition before me congruent with my entire world model, as required by learning things
from my training corpus. If it is well known, then we can likely ignore it this time, and
if it is not, then the next question will be, is it close enough that it might be worth
while extending the world model with this information? If the answer is no, then the
input is by its definition nonsense. Otherwise it is either a new fact or a lie, but since
we cannot tell, we have to accept it, possibly with a note that this is fresh, untested knowledge
that may turn out to be irrelevant, false, counterproductive, or noise. Next we can note
that it doesn't matter whether documents are text or images, or input from a point cloud
of sensors for robots or autonomous vehicle sensors. And finally we can note that this
definition also holds for humans if we take our corpus to be everything we've experienced
since birth.
Monika's Little Pills
Chapter 1
Why I Works
Intelligence equals understanding plus reasoning. Interest in artificial intelligence is exploding,
and for good reasons, computers and cars, phone apps, and on the web can do amazing
things that we simply could not do before 2012. What's going on? This is an attempt
to explain the current state of AI to a general audience without using mathematics, computer
science, or neuroscience, discussions at these levels with focus on how AI works. Here I
will discuss this at the level of epistemology and will try to explain why it works. Epistemology
sounds scary, but it really isn't. It's mostly scary because it is unknown, it is not taught
in schools anymore, which is a problem, because we now desperately need this branch of philosophy
to guide our AI development. Epistemology discusses things like reasoning, understanding, learning,
novelty, problem solving in the abstract, how to create models of the world, etc. These
are all concepts one would think would be useful when working with artificial intelligences,
but most practitioners enter the field of AI without any exposure to epistemology which
makes their work more mysterious and frustrating than it has to be. I think of it epistemology
as the general base for everything related to knowledge and problem solving. Science forms
a small special case subset domain where we solve well-formed problems of the kind that
science is best at. In the epistemology outside of science we are free to productively also
discuss pre-scientific problem solving strategies, which is what brains are using most of the
time. More later, intelligence equals understanding plus reasoning. In his book, Thinking Fast
and Slow, Daniel Kahneman discusses the idea that human minds use two different and complementary
processes, two different modes of thinking, which we call understanding and reasoning.
The idea has been discussed for decades and has been verified using psychological studies
and by neuroscience. Subconscious intuitive understanding is the full name of the fast
thinking or system one thinking. It is fast because the brain can perform many parts of
this task in parallel. The brain spends a lot of effort on this task. Conscious logical
reasoning is the full name of slow thinking or system two thinking. To many people's
surprise, this is very rarely used in practice. By soundbite for this is, you can make breakfast
without reasoning. Almost everything we do on a daily basis in our rich mundane reality
is done without a need to reason about it. We just repeat whatever worked last time we
performed this task. Real experience driven. Intuitive means that the system can very quickly
provide solutions to very complex problems but those solutions may not be correct every
time. Logical means that answers are always correct as long as input data is correct and
sufficient, which is not true in our rich mundane reality. It can only be true in a mathematically
pure model space. If you like logic, you must also like models. Subconscious means we have
no conscious, introspective access to these processes. You are reading this sentence
and you understand it fully but you cannot explain to anyone, including yourself, how
or why you understand it. Conscious means we are aware of the thought, we can access
it through introspection and we may find reasons to why we believe a certain idea. Expensive
is on the list because brains spend most of their effort on this understanding part. We
really shouldn't be surprised that AI now requires very powerful computers. More later.
In contrast, reasoning is efficient. It is most useful when you are stuck in a novel
situation or experience and understanding doesn't help you. Or perhaps you need to plan ahead
or need to find reasons for why something happened after the fact. It is used at a formal level
in the sciences. Reasoning is important but just rarely needed or used. Finally, understanding
is model-free and reasoning is model-based. This is likely the most important distinction
to people who are implementing intelligent systems since it provides a way to keep the
implementation on the correct path when the going gets rough. We cannot discuss these
issues quite yet but if you are curious you can watch the videos at Vimeo.com which discuss
this distinction at length. Think of the appearance in this table as a kind of foreshadowing.
All of this groundwork allows me to state the main point of this section. We have known
for a long time that brains use these two modes. But the AI research community has been spending
over much effort on the reasoning part and has been ignoring the understanding part for
60 years. We had several good reasons for this. Until quite recently, our machines were too
small to run any useful sized neural network. And also, we didn't have a clue about how
to implement this understanding. But that is exactly what changed in 2012 when a group
of AI researchers from Toronto effectively demonstrated that deep neural networks could
provide a simple kind of shallow and hollow proto-understanding. Well, they didn't call
it that, but I do. I will look just a little into the future and overstate this just a
little in order to make it more memorable. Deep neural networks can provide understanding.
This new phase of AI took decades to develop, but it would never have happened without people
like the group led by Jeffrey Hinton at the University of Toronto, who spent 34 plus years
to develop the deep neural network technology we now call, deep learning. A number of breakthroughs
from 1997 to 2006 led to a number of successful demonstrations, including first prizes in
AI competitions in 2012. And we therefore count this year as the birth year of machine
understanding. To an outsider, it may look like an older program or phone app might be
understanding whatever the app is doing, but that understanding really only happened in
the mind of the programmer creating the app. The programmer first simplified the problem
in their own head by discarding a lot of irrelevant detail using programmer's understanding.
The simplified mental model of the problem domain could then be explained to a computer
in the form of a computer program. What is changing is that computers are now making
these models themselves. The first bullet point describes regular programming, including
old style AI programs. AI has, since 1955, provided many novel and brilliant algorithms
that we now use in programs everywhere. But when you contrast old style AI to understanding
systems, the old kind of AI is basically indistinguishable from any other kind of programming we do
nowadays. The second bullet point describes the recent developments. Deep neural networks
are so different from regular programs that we have to acknowledge them as a different
computational paradigm. This is why they took almost four decades to develop. And the
paradigm, being pre-scientific and model-free, is difficult to grasp if you receive a solid
reductionist and model-based education. It takes a long time for an established AI practitioners
or experienced programmer to switch. People who are just starting out in AI have an easier
time assimilating this new paradigm since they haven't had a full career's worth of
experience and success using old style AI techniques. The amount of work we have to
do to get a deep neural network to understand is surprisingly small, and companies like
Google and Cintiens are working on eliminating the remaining effort of programming neural
networks. This is where things will get really weird. When the deep neural network, DNN,
understands enough about the world and about the problem it is faced with, then we no longer
need a programmer to acquire this understanding. Let me elaborate. Programmers are employed
to bridge two different domains. They first have to study whatever application domain
they are working on. For instance, if they are writing an airline ticket reservation
system they will have to learn a lot of detailed information about airlines, airline tickets,
flights, luggage, etc. and then know to provide features for unusual cases such as cancelled
flights. And then the programmer uses their understanding of the problem domain to explain
to a computer how it can reason about these things, but the programmer cannot make the
system understand, it can only put in the hollow and fragile kind of reasoning, as a
program with many of thin cases, and any misunderstandings the programmer has about the problem domain
will become bugs in the computer program. Notice the shift in terminology. More later.
But today, for certain classes of moderately complex problems, we can use a DNN to automatically
learn for itself how to understand the problem, which means we no longer need a programmer
to understand the problem. We have delegated our understanding to a machine, and if you
think about that for a minute you will see that that's exactly what an AI should be doing.
It should understand all kinds of things, so that we humans won't have to. And there
are two common situations where this will be a really good idea. One is when we have
a problem we cannot understand ourselves. We know a lot of those, starting with cellular
biology. The other common case will be when we understand the problem well, but making
the machine understand it well enough to get the job done is cheaper and easier than any
alternative. MoonBoss accomplish this level of using old style AI methods, but I predict
we will one day be flooded with similar, but DNN based devices that understand several
aspects of domestic maintenance, as well as we do. Do machines really understand? If we
give a picture like this to a DNN trained on images it will identify the important objects
in the image and provide the rectangles, called, owning boxes as approximations to where the
objects are. The text on the right says, woman in white dress standing with tennis racket
to people in green behind her, which is not a bad description of the image. It could be
used as the basis for a test for English skill level for adult education placement, for all
practical purposes. This is understanding. We had no idea how to make our computers do
this before 2012. This is a really big deal. This feat requires not only a new algorithm,
it requires a new computational paradigm and images to a computer, a single long sequence
of numbers denoting values for red, blue and green colors and values from 0 to 255. It
also knows how wide the image is. How does it get from this very low level representation
to knowing that there is a woman with a tennis racket in the image? This is what William
Calvin has called, a river that flows uphill. There are very few mechanisms that can go
in this direction, from low levels to high levels. Calvin used the term to describe evolution,
and I can use this quote to describe understanding. I like to think of evolution as, nature's
understanding because the phenomena are very similar at several levels. Evolution of species
can bring forth advanced species starting from simpler species in the same manner that
understanding is the discovery and reuse of high level concepts and low level input.
In contrast, reasoning proceeds by breaking problems into sub-problems and solving those,
which is a, flowing downhill, kind of strategy. In mathematics we accept, and many mathematicians
only accept this reluctantly, that we need to use induction to move uphill in abstractions,
and that's a very limited uphill movement at that. Epistemology allows for much stronger
uphill moves. This is known as, jumping to conclusions on scant evidence and it's allowed
in epistemology based pre-scientific systems. As an aside, here's a pretty deep related
thought. In nature, evolution reuses anything that works. I like to think that understanding
is a spandrel of evolution itself. Neural Darwinism certainly straddles this gap. Could
be coincidence, or the only answer that will work at all. More later, we doubled our AI
toolkit in 2012. We can now use these deep neural networks as components in our systems
to provide understanding of certain things like vision, speech, and other problems that
require that we discover high level concepts and low level data. The technical, epistemology
level name for this uphill flow in processes, reduction, and we'll be using that term later
after we explain what it means. Let's look at what the industry is doing with their new
found toys. This is my view of what I think Tesla is doing, based on public sources in
their self-driving, autopilot, cars, cameras feed vision understanding components based
on deep learning, and radar feeds to radar understanding components. These supply bounding
boxes in 2D or 3D with additional information like, there's a woman with a tennis racket
ahead to a traffic reasoning component that uses regular programming, or some old style
AI like a rule based system to actually control the car based on the vision and radar inputs,
and the driver's desires. But this is not the only possible configuration. George Hopps
at Comma.ai, a team at NVIDIA Corporation, and the deep Tesla class at MIT are using
a simpler architecture with just a neural network that implements lane following and
other simple driving behaviors directly in one single deep neural network. There's room
for improvement, but there a big step in the direction we want to move in. Future automotive
systems will likely integrate everything about driving into one single neural network, or
something that effectively behaves as one. Vision, traffic, the car itself including
various functionality like windscreen wipers, lights, and entertainment, how to drive in
a safe and polite manner, and to understand also the drivers or car owners desires. And
if we've gotten that far, then it is a given that we will have speech input and output
so that the driver can have a conversation with the car while driving, and can just
advise it in case it does something wrong. We are nowhere close to this today. But after
a DNM breakthrough or two, who knows how quickly these kinds of systems become available. We
can already see an increasing stream of new features built using understanding components.
This article, and the next, are expansions of a talk given on June 10, 2017 at the San
Francisco Bill Conference. A decade ago I created artificialintuition.com. I now have
a lot more to say, but I need to split this meme package into digestible chunks. This
takes a lot of effort to get right. If you liked this article and would like to see more
like it then you can support my writing and my research in many ways, small to large,
like and share these ideas with someone who might want to invest in sentience incorporated
or might be otherwise interested in my research on a novel language understanding technology
called organic learning. More on that later. I do not receive external funding from any
investors for this research. You can support my research and writing directly at the donation
section at artificialintuition.com. Chapter 2. Our Greatest Invention, Model Based Problem
Solving. The first chapter, why AI works, provided the big picture of AI and understanding
machines. Next we will focus on how to actually implement understanding in a computer. But
before we can attack that core issue, we need to simplify the journey a bit by defining
four important words and concepts. I'll define one in this section, two in the next, and
the concept of reduction after that. We can then discuss the epistemology level algorithm
for understanding itself. If you are already familiar with these concepts, just check the
headings and definitions that follow in order to ensure we are using these words roughly
the way you use them. You may have noticed I write certain, sometimes common words,
such as model, with an uppercase first letter. This means I am using the word in a technical,
well-defined, unchanging sense. I will define all such technical terms over time and I will
try not to use these terms until I have defined them. We define 11 such terms in the first
chapter, starting with understanding and reasoning. A dictionary of defined terms is in the works.
Models are simplifications of reality in epistemology and science. Models are simplifications
of reality. A rich mundane reality is too complex to land itself directly to computation.
In OTB science fiction shows, we would sometimes hear. And then we fed all the information
into the computer and this is what came out. Well, not anymore. Audiences now know that's
not how regular computers work. Consider an automobile. It consists of thousands of parts,
each with properties like materials, size, color, function, and sometimes complex interactions
with other parts. What's all the information here? We can just feed all of those properties
and measurements and facts into a computer and expect to get an answer. We need to ask
a question and we also need to simplify the problem so that we can feed in just the facts
or numbers that matter so that our question can be answered with minimum effort. How do
we do that? We must identify or create, first in our minds, a very simple model of some
sort of a generic automobile, and use that model for our computation. After we get the
answer for the pure and simple model case, we apply the answer, with some care, back
to our complex reality where the real automobile and the problem exists. What kind of model
we choose depends on our goals. As an example of a model, Newton's second law states that
force equals mass times acceleration, f equals ma. This equation is a classical scientific
model. If we measure mass and acceleration of a car, then we can estimate how many horsepower
the engine has. To use this equation, we engineers would model, in our minds, the car as a single
small point mass with all the mass of the car in that point. Because if we don't, then
we'd have to worry about the car rotating and other problems. This is how model-based
science works. One or more scientists somehow derive a model for some phenomenon. The model
is published as an equation, a formula, or a computer program. Scientists and engineers
anywhere can now use this equation program model, treating it as a quick shortcut that
works every time, as long as they have correct input data and are confidently applying the
formula to a suitable problem in their reality. Our greatest invention, model-based problem
solving, aka reductionism, is the greatest invention our species has ever made. The
general strategy of simplifying problems before solving them must be tens of thousands of
years old. In some sense, it is a prerequisite for all other inventions, including the use
of fire. If you see a forest fire then you need to first imagine the utility of fire.
As a model, before you can figure out that it might be useful to carry home a burning
branch, we don't think of this problem solving strategy as an invention because it is already
ubiquitous in our lives. We are all taught how to use model-based problem solving in
school when we start solving story problems in math class, but most people never learn
the names of these strategies and are missing the big epistemology level picture. This rarely
matters until you start working with AI, where lack of an epistemological drowning may lead
you astray into failing strategies. These little pills are an attempt to remedy that.
Model-based methods were examined and refined into scientific methods over the past 450
years. Science is now a collection of thousands of models that taken together allow science
competent people to solve problems quickly and efficiently without having to redo all
the work that scientists, like Newton, put into creating these models in the first place.
And the sum total of those models covers many problems we want to solve scientifically,
such as how to build a bridge or travel to the moon. This reuse is what makes science
so effective, but not all sciences can benefit equally from this model-making. It works well
for physics, chemistry, and most of biochemistry. As I'm fond of saying, physics is for simple
problems, but as you get to more and more complex sciences, as you get further away
from physics and closer to life, it gets harder to make decent models. The models used by
for instance psychology, ecology, physiology, and medicine are generally more complex but
also less powerful than models in physics. Given some solid data, a physicist can compute
the mass of the proton to six decimal places, but we would have a harder time predicting
the number of muskrats in New England next summer because that outcome depends on millions
of parameters. The life sciences base many of their models on statistics. Statistical
models are among the weakest models used in science. These statistical models when more
powerful models with better predictive capabilities cannot be used for complexity reasons. Models
are apothesis, unverified models, scientific theories, models verified by peer review,
equations, formulas, complex scientific models, simulations of climate, weather, etc. Naive
models that we create to simplify our own lives. Computer programs, and what is mathematics?
It is a system that allows us to manipulate our models to cover more cases. Mathematics
is the purest, most context free of all scientific disciplines. As such, its greatest value to
humanity is in its role as a help discipline to all other disciplines. Einstein's famous
equals MC squared model was derived using mathematical manipulation of other models
known to Einstein at the time. But perhaps mathematics isn't as much a scientific discipline
as an epistemological one. I may explore this aside later. Model use requires understanding.
A good model is context free, since it maximizes the number of contexts it can be applied in.
Newton's second law, F equals MA, works pretty much everywhere. We have forces, masses, and
accelerations. The trade-off for this flexibility is that we ourselves need to understand the
problem domain. In rocket science, when maneuvering in space, F equals MA will often work perfectly,
but when you are applying it to the acceleration of your car, you need to account for lots of
effects like friction between the road and the wheels, wind resistance, and the like.
So, F equals MA, applied naively would give you the wrong answer if friction is involved.
This demonstrates the main disadvantage with models. They require that both the model maker,
scientists like Newton and the model users, STEM competent people everywhere, understand
enough about the problem domain to know whether the model is applicable or not, and how to use it.
This understanding is the expensive part of science, since using science requires first
getting a solid science education in order to avoid mistakes when using models.
And since models require understanding, they cannot be used to create understanding.
This is a major problem for AI implementers. Chapter 3
2 Dirty Words Reductionism is the use of models.
Holism is the avoidance of models. Matters are scientific models, theories,
hypotheses, formulas, equations, superstitions, and most computer programs.
Reductionism and Holism. After having sorted out what models are, we can now discuss two
complementary problem-solving strategies, or perhaps meta-strategies. There are in many ways
each other's opposites, but the classification can become an argument about novel levels and
definitions. I will initially pretend the division is clear and obvious, and will elaborate later.
Reductionism is the use of models. In this series we will use exactly the above definition of the
word, reductionism. If you look up the definition elsewhere you may find that some sources divide
the strategy into sub-strategies. They also seem to miss the most important sub-strategy,
which we'll discuss later. But what all these sub-strategies have in common is that they all
provide ways to simplify observations of fragments of our rich mundane reality into much simpler
models, which we use for reasoning, computation, and sharing. Reductionism is so central to how
we do science, the heavy reliance on models, such as theories, equations and formulas,
and physics, chemistry, etc. That we can speak of model-based sciences or reductionist sciences
where such model-making is easy and effective, and this classification excludes those sciences,
like psychology, where such model-making is difficult and less often rewarded with reliable
results. After considering all the advantages of models we might wonder why we even bother
discussing it. Too many people, especially those with a solid stem, science, technology,
engineering, and mathematics education, it may well look like the only choice,
but there's also the other strategy. Homism is the avoidance of models. This is where the
questions start. This is where the paradox is surface. This is where your worldview may get
shaken up. Seriously, especially if you are a scientist or engineer with a solid stem education
and decades of professional success using science and models. In some sense, the goal of this entire
series is to demonstrate that we need to use both problem-solving strategies when creating
our artificial intelligences, because that is what it is going to take. We need holistic
understanding. We established that in the first chapter, as a sample of the new ideas that we
will have to deal with I will just mention, reasoning is reductionist. Understanding is holistic.
Newer networks are holistic. Holistic systems can jump to conclusions on scant evidence.
Holistic systems can themselves know what is important and what isn't.
Holistic systems can solve problems we ourselves cannot or don't care to understand.
Holistic systems are model-free. We do not use any a priori models of any problem domain.
Reasoning systems inherit all problems and benefits of reductionism.
Understanding systems inherit all problems and benefits of holism.
Humans are born holistic. Humans each solve thousands of little
problems every day, and we are solving almost all these problems holistically, using understanding,
and without a need to reason at all. This includes fluent language use.
A stem education instills a strict reductionist discipline in order to mitigate problems
with fallibility of holistic human minds. Our intelligences are fallible.
These claims all deserve individual treatments, and we'll get to all of them in later sections.
But the major theme is clear. Humans are mainly holistic problem solvers.
This must be true for our artificial intelligences. We had several reasons for focusing on reductionist
methods, models, and reasoning during the first 60 years of AI. Our computers were too small to
make neural networks work at all. But there were also ideological reasons. AI was born out of the
math and computer science departments of our universities, and therefore most of the people
working on AI were solidly oriented towards the goal of creating a logic-based reductionist
infallible artificial mind. To build early AIs, like expert systems, we entered rules
or programmed in lots of facts to reason about. But this was budding reductionist castles in the
air, comprised of unanchored facts that didn't tie to any understanding whatsoever. The troubles
with classical AI, such as bitterness, the tendency to make spectacular and expensive
mistakes at the edges of their competence, can be directly traced to the lack of foundational
understanding to support these attempts at reasoning. Understanding machines will not
suffer from this brittleness, but will fail gracefully at the edges of their competence,
much like humans. Most of the time they will know the answer beyond that they will guess,
and the guesses they make are based on a lifetime of experience, gained through learning from a
large corpus and so they have a good chance of being at least a workable choice, if not perfect.
How can anyone solve problems without using models? A lot of people coming from a STEM
background cannot even imagine how to solve problems without using models. But it's not
hard, once you understand the difference, mostly it's a matter of doing what worked last time.
The problem is now figuring out whether we are in a situation that's similar enough that it will
work again. This is mostly a pattern matching problem. More later, what's the result? The
holistic answer is a quick guess at the best action, based on experience with similar situations.
Most of the time it's correct, sometimes it's a little wrong, and every now and then,
there's a noticeable mistake. And if we get things a little wrong, we may notice the outcome
and correct the action. We learn from our mistakes. If we practice something a lot,
we will start doing it effectively and perfectly every time. Do we learn faster if we make more
mistakes? Should we make mistakes on purpose? More later, in situations where you cannot use
models, which are more common than many realize, the holistic guess may also be your only option.
Conversely, if you have an adequately well-working model-based solution, just use that. My video,
Model-Free Methods Workshop demonstrates how the group solves four different problems
at a high level, using both reductionist and holistic methods. Why are these dirty words?
Well, they are not dirty to epistemologists. Reductionism has been the default problem-solving
paradigm because it's the one that has to be taught. We are born with a holistic problem-solving
apparatus. But reductionist science doesn't come naturally. Therefore, it has to be taught in
schools, practiced, and carried out according to certain rules. Perhaps that's why the sciences
are called disciplines, because following the ideal scientific method requires practice and
constant vigilance. J. C. Smutsbrook, Holism and Evolution, 1926 established the terminology in the
epistemological literature. And no inchrodinger wrote, what is life, 1944, questioning the power
of physics to provide useful explanations to the life sciences. Percy Grote, zen and the art of
motorcycle maintenance, 1974, had contrast something very holistic, zen Buddhism, with
something very reductionist, motorcycle maintenance. So the chasm between the strategies was identified
a long time ago. The strategies are each other's opposites. H-O-L-E-L-I-S-M-based strategies for
understanding can handle many important kinds of complexity and can quickly provide a guest
answer. But these guesses are fallible, and often more expensive to compute. Reductionist
education and strategies brought benefits of cheap model reuse and formal rigor to improve
correctness, but cannot handle complexity and is therefore dependent on an external
understander to determine applicability in real-world complexity rich situations.
And as part of that education, we are told that holistic methods, such as jumping to conclusions
unscanned evidence, are bad, in spite of the fact that our brains use holistic methods thousands
of times each day to successfully understand the environment we live in. We can all use
either strategy as appropriate. If we don't have a STEM education, we will still sometimes make
naive models. But sometimes there is a choice and different people may prefer one or the other.
When playing pool, some people estimate and compute bouncing angles and some people shoot
by feel. But we have our preferences, and it may be tempting to label a person with an overly
strong preference as a holistic or a reductionist. This is sometimes received badly, if perceived
as a limitation. Some dictionaries even flag reductionist as derogatory. And yet, some people
use it as a self-assigned label. I try to use these terms only as shorthand for a person with a
stated strong preference for holistic or reductionist methods. The two terms were very useful in
epistemology. But then someone invented the concept of holistic medicine. Instead of just shooting
a single medical problem, you analyze the patient's entire situation, attempting to account for diet,
exercise, sleep, work, habits, stress levels, allergies, family, friends, and environmental
poisons. A good idea, in general. But the wide scope was unmanageable by the, traditionally
reductionist medical establishment and the idea faded away. Instead, the whole idea of holism
became tainted as woo-woo in the term, holistic medicine, became associated with woo-woo merchants
selling crystals and aromatherapy. As explained above, holism is the avoidance of models,
or better phrased, holism is the metastrategy of avoiding a priori models of the problem domain.
That extra precision rarely matters. There's nothing woo-woo about it. It does say,
science not required, but, you can make breakfast without reasoning. It is important to note that
holistic methods are based on a lifetime of experience, in humans and a training corpus
worth of experience, in neural networks. When you're making breakfast, you are relying on this
experience, mostly repeating whatever worked yesterday. Some people claim they use reasoning
while making breakfast, but they can make their breakfast while speaking to someone else on the
phone. And as they hang up, they find themselves suddenly sitting at the breakfast table with
their coffee and hot oatmeal. Same thing when driving to work. You may get lost in thought,
and then you find yourself parked at work. You didn't need to reason, since all sub-problems
that occur in driving had been solved multiple times, during years of driving.
Sub-conscious understanding is used for simple things like sequencing our leg muscles as we
walk. You have no idea how you are doing that, it just works. Same thing with vision. You understand
that you are looking at a chair, but you do not have conscious access to the 15th rod cone pixel
to the left of your center of vision, and have no idea how this understanding works.
Same thing with understanding and generating language. You do not have any explanation for
how you are able to understand the meaning of this sentence. Understanding is sub-conscious
and holistic. So for the majority of things we do every day, we do not need reasoning or
reductionist methods. Some people would like to think they are, logical thinkers, immune to
most cognitive fallacies, but whether they are or not, at the lower levels, everyone is solving
most of their problems holistically. I claim that reductionist reasoning requires holistic
understanding. In other words, I need to understand the problem domain at hand before I can create
and reuse models to enable me to reason about the domain. So holistic understanding is much
more important than reductionist reasoning because it is the most used strategy, by far,
and the former is also a prerequisite for the latter. But the fallibility of holistic understanding
forced us to create reductionist science and to teach it in STEM education. It is as if the purpose
of science is to keep holistic guessing in check, but this aversion to fallibility has a cost,
because it means complexity bound and irreducible problems cannot be solved. Like language
understanding, global resource allocation, and social interactions, reductionism and model-based
science appeared around 1650 after a century of gestation. Excluding minor romantic interludes,
it has held its position as the dominant paradigm for about 400 years. This is changing.
The reductionist train is running out of track. The remaining hard problems facing humanity
are problems of irreducible complexity in domains where reductionist model-based methods
simply cannot work. Whether we like the idea or not, we need to accept these holistic methods
into our AI toolkits. Starting now, we will use these methods either in their raw form,
as model-free methods, or as understanding machines at any level from component to robot
co-worker. Chapter 4. Reduction. Epistemic reduction is a process that discovers higher-level
abstractions and lower-level data by discarding everything at the lower layer that it recognizes
as irrelevant. We have seen the power of models. We have introduced the two problem-solving
meta-strategies of reductionism and holism. We also noted that the creation and use of models
requires an intelligent agent that understands the problem domain. Someone or something has to
perform the reduction. I will now discuss reduction in some detail. Until 2012, only humans and other
animals with brains could perform reduction. Now our deep neural networks, DNN, can perform
limited reduction. How do brains and DNNs accomplish this? And how can we improve these algorithms?
This may be, to some readers, the most rewarding part of this series, because it provides you
the opportunity to learn a new and useful skill. Most people never think about the world at this
level. Knowledge of reduction provides a new point of view that you can use to better understand
your environment, other intelligent agents around you, and modern AI systems.
Definition of reduction. Reduction is a process that discovers higher-level abstractions and
lower-level data. We will initially note that reduction is exactly the same as abstraction.
Why do we need a new word? Because the term abstraction is mostly used
by scientists already operating in a pure model space, seeking a higher level of abstraction
in that space. But to them, abstraction is something that just magically happens in their
heads, since there are no scientific theories for how abstraction works. There cannot be,
since abstraction is a concept in epistemology, not science. AI researchers are starting from
something much closer to a rich mundane reality, where there is a lot of confounding context.
We are solving the metal problem of how to move from there into a space that is sufficiently
abstract to solve the problem at hand. Here, reduction is a much more appropriate term.
We can abstract the red pixel or the letter B, but we can reduce a rich context containing
that pixel or letter into a higher-level concept. We are swimming in reduction.
Paradoxically, one of the hardest things about teaching reduction is that we don't see the
need to learn about it because we all do it all the time, every millisecond, and the resulting
reductions, models, become available to our conscious minds as if, by magic, brains reduce
away 99.999% of their sensory input, but this process is subconscious and hence invisible to us.
The situation is much like, supposedly, a fish swimming in water. We are all masters of reduction,
but we don't know how we do it or that we even do it. We didn't know this would ever matter.
And generally, it doesn't. Well, it matters in epistemology, and it matters in AI,
since we need to actually implement that magic. We as epistemologists must know how abstraction
is actually performed, and we give the epistemology-level equivalent of abstraction the name
reduction, because that's the recipe for how to accomplish it. We reduce our rich mundane
reality by discarding, reducing away, what's irrelevant. And by using the name reduction,
we, as AI epistemologists, keep reminding ourselves how it is properly done.
Consider the following descriptions of a car. The slide is meant to be read from the bottom up,
to match abstraction levels from low to high. If I'm driving to work, I better be driving my car.
If the police are looking for a stolen car, they would be looking for red 2010 Toyota Celica.
If I'm buying a new car, then I might be looking for just a new Toyota Celica.
And a self-driving car would likely only need to understand whether an obstacle is a vehicle or
not, in order to model maximum speed for future movement. We see that we want to pick the appropriate
level of abstraction to deal with the same object, or topic, in different situations.
But more importantly, we see that we can get from a more detailed description,
at the bottom, to a more generic one, higher up, by simply discarding some detail.
I hasten to point out that reduction is more complicated than this simple example of decreasing
specificity shows. What we need to start somewhere in this image allows us to form intuitions that
will serve for a while. True reduction involves operations like shifting from syntax to semantics
or from instance to type. The appearance of car as an abstraction of Toyota, and the step from
my Toyota to a Toyota illustrates these steps. Algorithms for these things are known.
Salience, part of the trick is to know what to discard. At each level of abstraction,
something can typically be identified as the least important property. Red and Celica are more
significant than 2010 for anyone looking for a car. If we had started from my red 2010 Toyota
truck, then the word truck would not be discarded until the top level. Reduction requires understanding
what's relevant. In reduction we keep that which is salient. More later, partial reductions.
Most of the time we do not perform reduction all the way to models. I cannot stress this enough.
We discuss reduction to models for pedagogical reasons. It is easy to initially see the context
free model as the goal of reduction. In reality, in brains, we can stop reducing the moment we
recognize that we have a working answer or response, such as a command to contract some muscle or
having understood the meaning of a sentence subconsciously. At this point, there is still
some residual context but we use that context productively rather than discard it to move
to higher levels. Some people claim we use models for all our thinking, but I'm using capital M
model only to describe a completely context free abstraction. F equals M A is an example of that.
There is no need to check whether a car is a red car or a Toyota. The equation works not only for
all cars but for all forces, masses and accelerations. We might come up with a special equation for
acceleration of Tesla cars which would require different inputs like battery charge level
and software settings. That would not be a context free model since it would not work on a Toyota.
For almost all tasks, basically, in everything except science and even there, only rarely,
we only perform as much reduction as is necessary to get the job done. When learning to ski,
you only figure out how you yourself need to perform given your body and equipment.
We do not need to parameterize our skiing skills for someone with twice the body mass
because that would be useless to us for the purpose of our own skiing. But a scientist would
have to go that far in order to parameterize away one more piece of context from the model
they are creating. For instance, when creating a skiing video game or designing a new ski,
if we consider the enormous amount of subconscious activity that happens in the brain,
we can safely say that partial reductions are the most common reductions. For instance,
when we take a step forward, our subconscious has analyzed our posture and velocity by using
reduction based on low level nerve signals and is commanding leg muscles to contract an
up precisely timed sequence. This activity is something we are unaware of. Most of us don't
even know what leg muscles we have. And there would be no time to perform reduction all the way to
models. That process takes a minimum of a half second and you don't have that kind of time
available to respond to an imbalance when walking or skiing. Reduction in society.
Most of us get paid to understand whatever we need to understand in order to perform our jobs.
In other words, most of us get paid to do reduction. If you are approving building permits,
you reduce a stack of forms to a one bit verdict of approved or rejected. We accelerate reduction,
and this is the main reason most of us haven't been replaced by robots.
But we see that when future understanding machines can perform reduction by themselves,
then we are unlikely to get paid for it. Levels of reduction.
Suppose a young man and a young woman fall in love, something happens to mess it all up,
and then they sort this out and reunite. This is what happened in the man's,
which mundane reality. Suppose the man wants to share this experience, because there was some
moral to the story that he thinks would be interesting to others and possibly important.
He could analyze what happened and figure out which were the key events in the saga and then
have actors on a stage re-enact the story as a play. This is a reduction because the boring parts
of the story would not be part of the play. They are discarded as irrelevant, but the story would
be acted out by real people in front of a live audience. If you are in the audience, you can move
your head to see behind any actor on the stage and you can clearly see everything on the stage,
not just one actor speaking at a time. He can make a movie about it. Now your point of view
is pre-defined by the camera angle and cropping. You can no longer see behind an actor, and you
can often only see those actors that are involved in the main action. He could write a book about it.
We no longer can see even the people described in the book, except in our imagination.
A critic review in the theater play may reduce it to, Boy meets girl, Boy loses girl, Boy gets
girl. A drama school graduate may summarize it as a double reversal plot. This is a description
that is so free from context, doesn't even specify boys or girls that it could be argued it qualifies
to be called a model. Plays, movies, books, stories, tropes, etc. are all partial reductions of
reality, and some are more reduced than others. Just like in the red Toyota case, we need to find
the appropriate level of abstraction to work with. The young man in the example, when writing a
book or a screenplay, has much in common with a scientist trying to describe something in nature
in a reusable context free manner by reducing it to a model. They are model makers, or are at
least performing partial reduction. They are discarding the irrelevant bits. The opposite of
reduction. We also need to be able to move in the opposite direction, from models to reality,
or at least from more abstract partial models to partial models closer to reality. When an actor
is given a screenplay, they know it only contains rough directions for what to do and what lines
to say. The actor's job is to give a little of themselves to flesh out the screenplay to actual
actions, including creating, synthesizing, the appropriate display of emotions, tone of voice,
and body language. They use their experience as people and as actors. They use elements of their
past lives and skills they have acquired by training to create something people in the audience
might relate to. For example, they may repurpose a personal experience. He is sad as when my
hamster died. Things they learned in drama school, such as speaking, singing, dancing, and swordplay,
from other actors, what would bogart do, from fiction, from other movies and plays, etc.
The actor's artist who convey whatever the script intends to convey, emotions, a morality cookie,
a political position, titillation, surprise, and so on. Starting from the simple model,
the screenplay, their job is similar to an engineer's when they are faced with a problem
and use a model to solve it. The engineer would use their experience to decide that
M is the mass of the car and not the tire pressure. The actor decides that sadness
is more appropriate than grief for a certain scene, etc. I call this process, which is the
opposite of reduction by the name it is used in problem solving application. We use a model to
simplify a problem situation, moving it into an abstract and pure model space. We solve the
problem there by performing math, perhaps, and then apply the answer to our rich reality
to the problem we are trying to solve. Many of you may recognize the word application or
its abbreviation, app. That's not as far-fetched as it might seem. Apps are software-based models.
Reduction in application and brains. Back to the issue of partial reductions.
Consider the actor reading a screenplay. They are using their eyes to gather pixels of color
and orientation. The brain then performs pattern matching, reduction, from these low-level signals
to letters, words, to language, to high-level concepts like love and separation, and eventually
to a high-level understanding of the playwright's intents. The actor then takes this high-level
understanding and by performing application, they add their own experience to the script
to get closer to reality and their performance. Our brains are capable of moving up and down
many levels of abstraction at once. Perhaps it tracks all of them simultaneously,
keeping layers of abstraction separate. This is a clue for why deep neural networks
perform better than shallow ones. Which is what we'll discuss next.
Chapter 5. Why Deep Learning Works. Deep learning performs epistemic reduction.
A math-free computer science-free description of why deep learning works. We have now built
a base of theory for why AI works, what models are, and how to create them, what reductionism
and holism are, and what the process of reduction is. These are the fundamentals of AI epistemology.
This base allows us to discuss various strategies to move towards understanding machines in a
well-understood and controlled manner. We are now ready to discuss why deep learning,
DL, works. This is the fifth and last entry in the AI epistemology primer. Deep learning
performs reduction. This is an unsurprising claim, considering the preceding chapters.
There are several mutually compatible theories for how deep learning works. But just as in
the first chapter, we will now discuss the epistemological aspects, why it works,
from several viewpoints and levels, starting from the bottom. We would use examples from the
TensorFlow system and API as a library, as a stand-in for all deep learning family algorithms
and TF programs, because the available API functions heavily shape and constrain solutions
that can be implemented in this space. And the generalization should be straightforward enough.
Consider the following illustration of image understanding using Keras, an excellent
abstraction layer on top of TensorFlow. I like to refer to the input layer as being
on the bottom rather than at the far left as in this image. When viewing it my way,
the low to high dimension we use in my rotated version of the image can be mentally mapped
to a low to high stack of abstraction levels. I'm not the only one using this dimension this way.
I hope this rotation isn't too confusing. We can see that there is an obvious data reduction
and an obvious complexity reduction. Can we determine whether the system is also performing
what I'd like to call the epistemic reduction? Is it reducing a way that which is unimportant?
And if so, how does it accomplish this? How does an operator in a deep learning stack
know what makes something important? Salient, up your data, reduction of sorts could be
accomplished by compression schemes or even random deletion. This is undesirable. We need to discard
the non-salient parts so that in the end, we are left with what is salient. Some people have not
understood the importance of salient's based reduction and useless compression power of
reversible algorithms as a measurement of intelligence, which is no more useful than
believing a simple video camera can understand what it sees. So let me conjure up a bit like in
the movie, Inside Out, a fairy tale of what goes on in a deep learning network, except we'll do it,
bottom up. Suppose we have built a system for finding faces in an image with the intent of
incorporating that as a feature in a camera. Many cameras have this feature already,
so this is not a far-fetched example. We implement an image understanding neural network,
show the system many kinds of images for a few days, perhaps using so-called supervised learning
in order to improve this story, and then we show it an image of a family having a picnic in a park
and ask the system to outline where the faces are so that the camera can focus sharply on them.
The input image is converted from RGB color values to an input array and the data in this array is
then shuffled through many layers of operators. And for many of these layers, there are fewer
outputs than there are inputs, as you can see above, which means some things have to be discarded
by the processing. Each layer receives initially signals, from below, that is, from the input,
or from lower levels of abstraction, and produces some reduced output to send to the next layer
operator above. To continue detail, at some early level, some operator is given a few
adjacent pixels and determines that there is a vertical, slightly curved line dividing the
darker green area from the lighter green area. So it tells the operator above the simpler line
or color-based description using some encoding we don't really care about. The operator at the
level above might have gotten another matching curve and says, these match what I saw a lot of
when the label blade of grass was given as a ground truth label during supervised learning.
If no label is known, then we again assume some other uninteresting representation.
It is okay to propagate results without human-labeled signals because whatever signaling scheme is
used will be learned by the level above. The operator above that says, when I get lots of
blades of grass signals, I reduce all of that to a long signal as I send it upward.
And eventually we reach the higher operator layers and someone there says, we are a face-finder
application. We are completely uninterested in lawns and discards the lawn as non-cellient.
What remains after you discard all non-faces are the faces. You cannot discard anything
until you know what it is, or can at least estimate whether it's worth learning. Specifically,
until you understand it at the level of abstraction you are operating at. The low-level blade of
grass recognizers could not discard the grass because they had no clue about the high-level
saliencies of lawn or not in face or not that the higher layers specialize in. You can only tell
what salient or not, important or not at the level of understanding and abstraction you are
operating at. Each layer receives lower-level descriptions from below, discards what it
recognizes as irrelevant, and sends its own version of higher-level descriptions upward
until we reach someone who knows what we are really looking for. This is of course why deep
learning is deep. This idea itself is not new. It was discussed by Oliver Selfridge in 1959.
He described an idea called, Pandemonium, which was largely ignored by the AI community because of
its radical departure from the logic-based AI promoted by people like John McCarthy and Marvin
Minsky. But Pandemonium presaged, by almost 60 years, the layer-by-layer architecture with
signals passing up and down that is used today in all deep neural networks. This is the reason my
online handle is at Pandemonica. So do any TensorFlow operators support this reduction?
Let's start by examining the pooling operators. There are a few in the diagram. They are conceptually
simple. There are over 50 pooling operators in TensorFlow. There is an operator named
2x2 Max Pool operator. In the diagram, it is used four times. It is given four inputs with
varying values and propagates the highest value of those as its only output. Close to the input
layer of these four values may be four adjacent pixels where their values might be a brightness
in some color channel, but higher up they mean whatever they mean. In effect, the Max Pool 2x2
discards the least important 75% of its input data, preserving and propagating only one
highest value. In the case of pixels, it might mean the brightest color value. In the case of blades
of grass, it might mean there is at least one blade of grass here. The interpretation of what is
discarded depends on the layer, because in a very real sense, layers represent levels of reduction,
abstraction levels, if you prefer that term. And we should now be clearly seeing one of the most
important ideas in deep neural networks, the reduction has to be done at multiple levels
of abstraction. Each set of decisions about what is reduced away as irrelevant and what is kept as
possibly relevant can only be made at an appropriate abstraction level. We cannot yet abstract away
the lawn if all we know is there are dark and light green areas levels. This is a simplification.
Decisions made in this manner will be heated only if they have contributed to positive outcomes in
learning. Unreliable and useless decision makers will be ignored using any of several mechanisms
that we may apply during learning. More later, for now, we continue by examining the most popular
subset of all TensorFlow operators. The convolution family from the TensorFlow manual,
note that although these ops are called convolution, they are strictly speaking cross
correlation. Convolution layers discover cross correlations and co-occurrences of various kinds.
Co-occurrences to known patterns in the image at various locations. Spatial relationships
within an image itself, like Jeff Hinton's recent example of the mouth normally being found below
the nose. And more obviously, in the supervised learning case, correlations between discovered
patterns and the available meta-information, tags, labels that correlate with the patterns
the system may discover. This is what allows an image-understander to tag the occurrence of a
nose in an image with the text string nose. Beyond this, such systems may learn to understand
concepts like behind and under. The information that is propagated to the higher levels in the
network now describes these correlations. Uncorrelated information is viewed as non-salient
and is discarded. In the Crescent diagram, this discarding is done by a max pooling layer after
the convolution plus ReLU layers. ReLU is a kind of layer operator that discards negative values,
introducing a non-linearity that is important for DL but not really important for our analysis.
This pattern of three layers, convolution, then ReLU, then a pooling layer, is quite popular
because this combination is performing one reliable reduction step. These three-layer types
in this packaged sequence may appear many times in a DL computational graph. In each of these
three-layer packages is reducing away things that levels below had no chance of evaluating
for saliency because they didn't understand their input at the correct level. Again,
this is why deep learning is deep because you can only do reduction by discarding the irrelevant
if you understand what is relevant and irrelevant at each different level of abstraction. Is
deep learning science or not? While the deep learning process can be described using mathematical
notation, mostly using linear algebra, the process itself isn't scientific. We cannot explain how
this system is capable of forming any kind of understanding by just staring at these equations,
since understanding is an emergent effect of repeated reductions over many layers.
Consider the convolution operators. As the TF manual quote clearly states, convolution layers discover
correlations. Many blades of grass together typically means a lawn. In TF, a lot of cycles
are spent on discovering these correlations. Once found, the correlation leads to some
adjustments of some way to make the correct reduction more likely to be rediscovered
the next round, because this reduction is done multiple times. But in essence,
all correlations are forgotten and have to be rediscovered in every path through the deep
learning loop of upward signaling and downward gradient descent with minute adjustments to
erring variables. This system is in effect learning from its mistakes, which is a good sign,
since that may well be the only way to learn anything. At least at these levels.
This up and down may be repeated many times for each image in the learning set. This up and down
makes some sense for image understanding. Some are using the same algorithms for text.
Fortunately, in the text case, there are very efficient alternatives to this ridiculously
expensive algorithm. For starters, we can represent the discovered correlations explicitly,
using regular pointers or object references in our programming languages.
Or, synapses in brains. This software neuron correlates with that software neuron says a
synapse or reference connecting this to that. We shall discuss such systems in the section on
organic learning, which is coming up next. Then either the deep learning family of algorithms,
or organic learning, are scientific in any meaningful way. They jump to conclusions on
scant evidence and trust correlations without insisting on provable causality. This is disallowed
in scientific theory, where absolutely reliable causality is the coin of the realm.
F equals m a or go home. The most deep neural network programming is uncomfortably close to
trial and error, with only minor clues about how to improve the system when reaching mediocre results.
Adding more layers doesn't always help. These kinds of problems are the everyday reality to
most practitioners of deep neural networks. With no a priori models, there will be no a priori
guarantees. The best estimate of the reliability and correctness of any deep neural network,
or even any holistic system we can ever devise, is going to be extensive testing.
We're on this later. Why would we ever use engineered systems that cannot be guaranteed
to provide the correct answer? Because we have no choice. We only use holistic methods when the
reliable reductionist methods are unavailable. As is the case when the task requires the ability
to perform autonomous reduction of context rich slices of our rich complex reality as a whole.
When the task requires understanding, don't we have an alternative to these under liable machines?
Sure we do. There are billions of humans on the planet that are already masters of this complex
task because they live in the rich world and need skills that are unavailable with reductionist methods,
starting with low level things like object permanence. So you can replace a well performing
but theoretically unproven contraption, a holistic understanding machine built out of deep neural
networks, with a well performing human being using a deeply mystical kind of understanding
hidden in their opaque heads. Who earns much more per hour. This doesn't look like much of an
improvement. The machine cannot be proven correct because it doesn't function like normal computers.
It is performing reduction, the skill formally restricted to animals. A holistic skill. My
favorite soundbite is a mere corollary to the frame problem by McCarthy and Hayes. You have seen
it and you will see it again, since it is one of the stronger results of AI epistemology.
But we will, in but a few years, agree on a definition of intelligence that makes autonomous
reduction a requirement. This once semi-heretic soundbite will then be obvious to all. If it
isn't already, our intelligences are fallible. Chapter 6. Experimental Epistemology for AI
We can now create computer based experimental implementations to epistemology level theories
in order to test them and learn from the outcomes. Experimental epistemology is the use of the
experimental methods of the cognitive sciences to shed light on debates within epistemology,
the philosophical study of knowledge and rationally justified belief. Some skeptics contend that
experimental epistemology or experimental philosophy more generally is an oxymoron.
If you are doing experiments, they say, you are not doing philosophy. You are doing psychology
or some other scientific activity. It is true that the part of experimental philosophy that is
devoted to carrying out experiments and performing statistical analyses on the data obtained is
primarily a scientific rather than a philosophical activity. However, because the experiments are
designed to shed light on debates within philosophy, the experiments themselves grow out of mainstream
philosophical debate and their results are injected back into the debate, with an item
moving the debate forward. This part of experimental philosophy is indeed philosophy,
not philosophy as usual perhaps, but philosophy nonetheless. Experimental epistemology by James
R. B. B. Traditional experimental epistemology conducted experiments on interviews and psychological
tests on human volunteers or relied on population statistics. As one of the newer branches of
cognitive science, machine learning has now provided us with a very different approach
to this domain. We can now create computer-based experimental implementations to epistemology
level theories in order to test them and learn from the outcomes. In machine learning, the most
important epistemology level concepts and hypotheses are about reasoning, understanding,
learning, epistemic reduction, abstraction, creativity, prediction, attention, instincts,
intuitions, concepts, resiliency, models, reductionism, wholism, and other things all
sharing these features. One, science has no equations, formulas, or other models for how
they work. They're epistemology level concepts, not science level concepts. Two, our theories
about these concepts have to be sufficiently solid and detailed to allow for computer implementations.
This is because science itself is built on top of epistemology level concepts, and practitioners
need to be aware of this or they will experience cognitive dissonance-induced confusion and stress.
The red pill of machine learning confronts the elephant in the room of machine learning.
Machine learning is not scientific. What can we learn from AI epistemology? An excerpt from the
red pill can say the following statements from the domain of epistemology and how each of them
can be viewed as an implementation hint for AI designers. We are already able to measure
their effects and system competence. You can only learn that which you already almost know.
Patrick Winston, MIT. Our intelligences are fallible. Monica Anderson. In order to detect
that something is new, you need to recognize everything old. Monica Anderson. You cannot
reason about that which you do not understand. Monica Anderson. You are known by the company
you keep, simple version of the Yanida Lemur from Category Theory and the justification for embeddings
in deep learning. All useful novelty in the universe is due to processes of variation and
selection. The selectionist manifesto. Selectionism is the generalization of Darwinism. This is
right genetic algorithms work. Science has no equations for concepts like understanding, reasoning,
learning, abstraction, or modeling since they are all epistemology level concepts.
We cannot even start using science until we have decided what model to use. We must use our
experience to perform epistemic reductions, discarding the irrelevant, starting from the messy
real world problem situation until we are left with a scientific model we can use, such as an
equation. The focus in AI research should be on exactly how we can get our machines to perform
this pre-scientific epistemic reduction by themselves and the answer to that cannot be found inside
of science. Chapter 7. The Red Pill of Machine Learning. Reductionism is the use of models.
Holism is the avoidance of models. Models are scientific models, theories, hypotheses, formulas,
equations, naive models based on personal experiences, superstitions, and traditional
computer programs. The deep learning revolution of 2012 changed how we think about artificial
intelligence, machine learning, and deep neural networks. What changed, and what does this mean
going forward? The new cognitive capabilities in our machines are the result of a shift in the way
we think about problem solving. It is the most significant change ever in artificial intelligence
AI, if not in science as a whole. Machine learning, ML based systems are successfully
attacking both simple and complex problems using novel methods that only became available after
2012. We are experiencing a revolution at the level of epistemology which will affect much more
than just the field of machine learning. We want to add more of these novel methods to our
standard problem solving toolkit, but we need to understand the trade-offs and the conflict.
I argue that understanding deep neural networks, DNNs, and other ML technologies requires that
practitioners adopt a holistic stance which is, at important levels, blatantly incompatible with
the reductionist stance of modern science. As ML practitioners we have to make hard choices
that seemingly contradict many of our core scientific convictions. As a result we may get
the feeling something is wrong. The conflict is real and important and the seemingly counter-intuitive
choices make sense only when viewed in the light of epistemology. Improved clarity in these matters
should alleviate the cognitive dissonance experienced by some ML practitioners and should
accelerate progress in these fields. The title refers to the eye-opening clarity
some machine learning practitioners achieve when adopting a holistic stance. Parallel dichotomies
sentient sync research is natural language understanding, NLU. We are creating novel
systems that allow computers to learn to understand human natural languages. Any one of them,
we use deep neural networks of our own design. The goal is to achieve some kind of human-like
but not necessarily human-level understanding. This is very different from traditional natural
language processing, NLP, which relies on human-made models of some language, such as English,
and perhaps models of fragments of the world. The NLP and NLU disciplines have chosen
opposite answers to their difficult two-way choices. They are now defined by these choices,
and we can use their stances to highlight the main conflict. The split is so deep
that it cuts through many layers of our reality. The following dichotomies are all manifestations
of this incompatibility at different levels, listed by impact, but discussed in no particular order.
The main science, the complex, including the mundane, epistemology, reductionism,
realism, meanings, reasoning, understanding, problem solving, plan it, then do it, just do it.
Artificial intelligence, 20th century, good, old-fashioned AI machine learning,
deep neural networks, natural language and computers, NLP, NLU. The problem-solving level
provides many familiar examples of these issues. In our mundane lives, we solve many kinds of
problems every day but our strategies for solving them fall into just those two categories.
For any complicated problem, we had better have a plan before we start, but most problems
the brain deals with every day are things we never have to think about because we do not need to plan
a reason about them. These are the millions of low-level problems we encountered in our
mundane life every day, and this is the world that our AIs will have to operate in.
Consider someone walking across the floor. Their brain signals their leg muscles to contract in
the correct cadence. Do they need to consciously plan each step? Do they reason about how to
maintain their balance? No. They probably don't even know what leg muscles they have.
Consider understanding this sentence. Did you use reasoning? Did you use grammar?
If you are a fluent speaker, you do not need grammars to understand or produce language,
and you do not have time to reason about language while hearing it spoken. Reasoning is slow,
but understanding is instantaneous. Consider someone braking for a stoplight.
How hard should they push on the brake pedal? Do they compute the required differential equation?
Should such equations be part of the driver's license test?