-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathPy_descriptors_1.txt
604 lines (490 loc) · 21.4 KB
/
Py_descriptors_1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
Python Descriptors Demystified
Python includes many built-in language features to enable concise,
easily-understood code. Some of these niceties include list/set/dictionary
comprehensions, properties, and decorators. For the most part, these
"intermediate-level" language features are well-documented, and easy to learn.
There is one notable exception to this: descriptors. For me at least,
descriptors were the feature of the core Python language that remained
mysterious for the longest time. There are a few reasons for this: The official
documentation on descriptors is rather esoteric, and doesn't include good use
cases for why you might write descriptors (My apologies to Raymond Hettinger,
whose other Python articles and
videos I have found very helpful).
The syntax for writing descriptors is a little weird. Custom descriptors might
be the least-utilized feature of the Python language, so it's hard to find good
examples in open source projects. Nevertheless, descriptors do have their use
once you figure them out. This document tries to build the argument for what
descriptors do, and why you should care. The punchline: descriptors are
reusable properties
Here's what we're building up to: fundamentally, descriptors are properties
that you can reuse. That is, descriptors let you write code that looks like
this
f = Foo()
b = f.bar
f.bar = c
del f.bar
and, behind the scenes, calls custom methods when trying to access (b = f.bar),
assign to (f.bar = c), or delete an instance variable (del f.bar) Let's
establish why being able to disguise function calls as attribute access is a
good thing. Properties disguise function calls as attributes
Imagine you are writing some code to organize information about movies (spoiler
alert: these projects beat you to it).
You might end up with a movie class that looks like this:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.budget = budget
self.gross = gross
def profit(self):
return self.gross - self.budget
You start using this class in other parts of your project, but then you realize
something: by mistake, you sometimes assign negative budgets to movies. You
decide this is bad, and want the Movie class to forbid this. The first thing
you think to try is this:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
if budget < 0:
raise ValueError("Negative value not allowed: %s" % budget)
self.budget = budget
def profit(self):
return self.gross - self.budget
But that won't work, because other parts of your code assign values to
Movie.budget directly -- this new class catches data entry errors within the
__init__ method, but not the cases where somebody tries to run m.budget = -100
on a pre-existing instance. What's a cinephile pythonista to do? Luckily,
Python properties solve this problem. If you've never seen properties before,
here's how they work:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self._budget = None
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
self.budget = budget
@property
def budget(self):
return self._budget
@budget.setter
def budget(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._budget = value
def profit(self):
return self.gross - self.budget
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget # calls m.budget(), returns result
try:
m.budget = -100 # calls budget.setter(-100), and raises ValueError
except ValueError:
print "Woops. Not allowed"
964000
Woops. Not allowed
We specify a getter method with a @property decorator, and a setter method with
a @budget.setter decorator. When we do that, Python automatically calls the
getter whenever anybody tries to access the budget. Likewise Python
automatically calls budget.setter whenever it encounters code like m.budget =
value. Take a moment to appreciate how nice it is that Python does this: if
properties didn't exist, we'd have to hide all of our instance attributes, and
provide lots of explicit methods like get_budget and set_budget. Code that uses
our classes would constantly be calling these getter/setter methods, and would
start to look like crufty Java code. Even worse, if we ignored this coding
style and just gave direct access to an instance attribute like budget, there
would be no clean way to later add the non-negativity check -- we would have to
retroactively create the set_budget method, and search our entire project to
change lines like m.budget = value to m.set_budget(value). Gross. So
properties let you attach custom code to variable getting/setting, while
maintaining a simple attribute-like interface for your classes. Nice.
Properties Get Tedious
The main downside to properties is that they aren't reusable. For example,
let's assume you want to add the non-negativity check to the rating, runtime,
and gross fields as well. Here's the new class
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self._rating = None
self._runtime = None
self._budget = None
self._gross = None
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
self.budget = budget
#nice
@property
def budget(self):
return self._budget
@budget.setter
def budget(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._budget = value
#ok
@property
def rating(self):
return self._rating
@rating.setter
def rating(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._rating = value
#uhh...
@property
def runtime(self):
return self._runtime
@runtime.setter
def runtime(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._runtime = value
#is this forever?
@property
def gross(self):
return self._gross
@gross.setter
def gross(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._gross = value
def profit(self):
return self.gross - self.budget
That's a lot of code, and a lot of duplicated logic. While properties make the
outsides of classes look nice, they don't make the insides of classes look
nice. Descriptors (Finally)
This is the problem that descriptors solve. Descriptors generalize properties,
and let you write separate classes for reusable property logic. Here's an
example of how they work (for the moment, don't worry about what's inside
NonNegative):
from weakref import WeakKeyDictionary
class NonNegative(object):
"""A descriptor that forbids negative values"""
def __init__(self, default):
self.default = default
self.data = WeakKeyDictionary()
def __get__(self, instance, owner):
# we get here when someone calls x.d, and d is a NonNegative instance
# instance = x
# owner = type(x)
return self.data.get(instance, self.default)
def __set__(self, instance, value):
# we get here when someone calls x.d = val, and d is a NonNegative instance
# instance = x
# value = val
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self.data[instance] = value
class Movie(object):
#always put descriptors at the class-level
rating = NonNegative(0)
runtime = NonNegative(0)
budget = NonNegative(0)
gross = NonNegative(0)
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.budget = budget
self.gross = gross
def profit(self):
return self.gross - self.budget
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget # calls Movie.budget.__get__(m, Movie)
m.rating = 100 # calls Movie.budget.__set__(m, 100)
try:
m.rating = -1 # calls Movie.budget.__set__(m, -100)
except ValueError:
print "Woops, negative value"
964000
Woops, negative value
There's some new syntax in here, so let's look at things piece by piece:
NonNegative is a descriptor object. It's a descriptor because it defines
the __get__, __set__, or __delete__ method. The Movie class looks very
clean. We create 4 descriptors at the class level, and treat them like
normal (instance-level) attributes everywhere else. And apparently, the
desciptors are checking for non-negative values for us. Accessing a
descriptor
When Python sees the line print m.budget, it recognizes that budget is a
descriptor with a __get__ method. Instead of passing m.budget to print
directly, it calls Movie.budget.__get__, and feeds the result of that to print.
This is similar to what happens when you access a property -- Python
automatically calls a method, and returns the result. __get__ receives two
arguments: the instance object to the left of the period (that is, the m object
in m.budget), and the
type of that instance (Movie). In some Python documentation, Movie is called
the owner of the descriptor. If we had asked for Movie.budget, Python whould
have called Movie.budget.__get__(None, Movie); that is, the fist argument is
either an instance of the owner, or None. These input arguments may seem weird
to you, but they're there to give you information about what object the
descriptor is part of. This will make sense once we look inside the NonNegative
class. Assigning to a descriptor
When Python sees m.rating = 100, Python recognizes rating is a descriptor with
a __set__ method, and it calls Movie.rating.__set__(m, 100). Like __get__, the
first argument of __set__ is the instance to the left of the period (the m in
m.rating = 100). The second argument is the value to the right of the equals
sign (100). Deleting a descriptor
For the sake of completeness, if you call del m.budget, Python will call
Movie.budget.__delete__(m). How NonNegative works
With this in mind, we can now look to see how the NonNegative class works. Each
instance of NonNegative maintains a dictionary that maps owner instances to
data values. When we call m.budget, the __get__ method looks up the data
associated with m, and returns the result (or a default value, if no such value
exists). __set__ uses the same
approach, but includes the extra non-negativity check. We use a
WeakKeyDictionary instead of a normal dict to prevent a memory leak -- we don't
want an instance to stay alive simply because it's in the descriptor
dictionary, and otherwise unused. Working with descriptors is slightly
awkward. Because they live at the class level, every instance shares the same
descriptor. This means that descriptors have to manually manage different
states for different object instances, and need to explicitly be passed
instances as the first argument of the __get__, __set__, and __delete__
methods. Hopefully, however, this example gives you an idea of what
descriptors can be useful for -- they provide a way to organize property logic
into isolated classes. If you find yourself repeating the same logic across
several properties, that should be a clue to consider whether refactoring that
code into a descriptor is worthwhile. Recipes and Gotchas
Put descriptors at the class level
For descriptors to work properly, they must be defined at the class level. If
you don't, Python doesn't automatically invoke the __get__ and __set__ methods
for you:
class Broken(object):
y = NonNegative(5)
def __init__(self):
self.x = NonNegative(0) # NOT a good descriptor
b = Broken()
print "X is %s, Y is %s" % (b.x, b.y)
X is <__main__.NonNegative object at 0x10432c250>, Y is 5
As you can see, accessing the class-level descriptor y automatically calls
__get__. However, accessing the instance-level descriptor x returns the
descriptor itself, sans magic. Make sure to keep instance-level data
instance-specific
You might be tempted to write the NonNegative descriptor like this
class BrokenNonNegative(object):
def __init__(self, default):
self.value = default
def __get__(self, instance, owner):
return self.value
def __set__(self, instance, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self.value = value
class Foo(object):
bar = BrokenNonNegative(5)
f = Foo()
try:
f.bar = -1
except ValueError:
print "Caught the invalid assignment"
Caught the invalid assignment
That seems to work fine. The problem here is that all instances of Foo share
the same bar instance, leading to this flavor of sadness:
class Foo(object):
bar = BrokenNonNegative(5)
f = Foo()
g = Foo()
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar) #ouch
f.bar is 5
g.bar is 5
Setting f.bar to 10
f.bar is 10
g.bar is 10
This is why we used the data dictionary in NonNegative. The first argument to
__get__ and __set__ tell us which instance to consider. NonNegative uses this
argument as a dictionary key, to keep data for each Foo instance separate.
class Foo(object):
bar = NonNegative(5)
f = Foo()
g = Foo()
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar) #better
f.bar is 5
g.bar is 5
Setting f.bar to 10
f.bar is 10
g.bar is 5
This is the most awkward aspect of descriptors (full disclosure: I don't
actually understand why Python doesn't let you define descriptors at the instance
level, and always dispatch to __get__ and __set__. There must be some reason
why this doesn't work). Beware unhashable descriptor owners
NonNegative uses a dictionary to keep instance-specific data separate. This
normally works fine, unless you want to use descriptors with unhashable objects : clas
s
MoProblems(list): #you can't use lists as dictionary keys x = NonNegative(5)
m = MoProblems()
print m.x # womp womp
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-dd73b177bd8d> in <module>()
3
4 m = MoProblems()
----> 5 print m.x # womp womp
<ipython-input-3-6671804ce5d5> in __get__(self, instance, owner)
9 # instance = x
10 # owner = type(x)
---> 11 return self.data.get(instance, self.default)
12
13 def __set__(self, instance, value):
TypeError: unhashable type: 'MoProblems'
Because instances of MoProblems (which is a subclass of list) aren't hashable,
they can't be used as keys in the data dictionary for MoProblems.x. There are a few
ways around this, though none are perfect. The best approach is probably to "label" your
descriptors
class Descriptor(object):
def __init__(self, label):
self.label = label
def __get__(self, instance, owner):
print '__get__', instance, owner
return instance.__dict__.get(self.label)
def __set__(self, instance, value):
print '__set__'
instance.__dict__[self.label] = value
class Foo(list):
x = Descriptor('x')
y = Descriptor('y')
f = Foo()
f.x = 5
print f.x
__set__
__get__ [] <class '__main__.Foo'>
5
This relies on a highly non-obvious detail of Python's method resolution order.
We label each descriptor in Foo with the same name as the variable that we
assign the descriptor to (for example, x = Descriptor('x')). The descriptor
then stores instance-specific data in f.__dict__['x']. This dictionary entry
would normally be what Python returns when we ask for f.x. However, because
Foo.x is a descriptor, Python doesn't use f.__dict__['x'] normally, and the
descriptor can safely store stuff there. Just make sure you don't label the
descriptor anything else:
class Foo(object):
x = Descriptor('y')
f = Foo()
f.x = 5
print f.x
f.y = 4 #oh no!
print f.x
__set__
__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'>
5
__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'>
4
I don't love this pattern, since it's fragile and subtle, but it's fairly
common. And it works for unhashable owner classes. David Beazley uses it in his
books Labeled Descriptors with Metaclasses
Because descriptor labels match the variable name they are assigned to, some
people use metaclasses to take care of this bookkeeping automatically:
class Descriptor(object):
def __init__(self):
#notice we aren't setting the label here
self.label = None
def __get__(self, instance, owner):
print '__get__. Label = %s' % self.label
return instance.__dict__.get(self.label, None)
def __set__(self, instance, value):
print '__set__'
instance.__dict__[self.label] = value
class DescriptorOwner(type):
def __new__(cls, name, bases, attrs):
# find all descriptors, auto-set their labels
for n, v in attrs.items():
if isinstance(v, Descriptor):
v.label = n
return super(DescriptorOwner, cls).__new__(cls, name, bases, attrs)
class Foo(object):
__metaclass__ = DescriptorOwner
x = Descriptor()
f = Foo()
f.x = 10
print f.x
__set__
__get__. Label = x
10
I won't explain the details of metaclasses -- David Beazley's tutorial at the
bottom of this article covers them. The main point is that the metaclass
auto-assigns descriptor labels, to match the variable name that each descriptor
is assigned to. While this solves the problem of mismatched descriptor labels
and variable names, it does so by adding all the complexity of metaclasses. You
can decide if this is worth the hassle, but I have my doubts. Accessing
Descriptor Methods
Descriptors are just classes, and you may want to add other methods to them.
For example, descriptors are a great way to implement callback properties. Say
we want a class to notify us whenever part of its state changes. Here's most of
the code to do that
class CallbackProperty(object):
"""A property that will alert observers when upon updates"""
def __init__(self, default=None):
self.data = WeakKeyDictionary()
self.default = default
self.callbacks = WeakKeyDictionary()
def __get__(self, instance, owner):
return self.data.get(instance, self.default)
def __set__(self, instance, value):
for callback in self.callbacks.get(instance, []):
# alert callback function of new value
callback(value)
self.data[instance] = value
def add_callback(self, instance, callback):
"""Add a new function to call everytime the descriptor updates"""
#but how do we get here?!?!
if instance not in self.callbacks:
self.callbacks[instance] = []
self.callbacks[instance].append(callback)
class BankAccount(object):
balance = CallbackProperty(0)
def low_balance_warning(value):
if value < 100:
print "You are poor"
ba = BankAccount()
# will not work -- try it
#ba.balance.add_callback(ba, low_balance_warning) This is a promising pattern
-- we can attach custom callback functions to respond to state changes within a
class, without having to modify the class code at all. That's a lovely
separation of concerns. All we need to do now is call
ba.balance.add_callback(ba, low_balance_warning), so that low_balance_warning
is called whenever balance changes. But how do we do that? Descriptors always
call __get__ when we try to access them. It would seem that the add_callback
method is unreachable! The trick is to take advantage of the special case that,
when accessed from the class level, the first argument to __get__ is None:
class CallbackProperty(object):
"""A property that will alert observers when upon updates"""
def __init__(self, default=None):
self.data = WeakKeyDictionary()
self.default = default
self.callbacks = WeakKeyDictionary()
def __get__(self, instance, owner):
if instance is None:
return self
return self.data.get(instance, self.default)
def __set__(self, instance, value):
for callback in self.callbacks.get(instance, []):
# alert callback function of new value
callback(value)
self.data[instance] = value
def add_callback(self, instance, callback):
"""Add a new function to call everytime the descriptor within instance updates"""
if instance not in self.callbacks:
self.callbacks[instance] = []
self.callbacks[instance].append(callback)
class BankAccount(object):
balance = CallbackProperty(0)
def low_balance_warning(value):
if value < 100:
print "You are now poor"
ba = BankAccount()
BankAccount.balance.add_callback(ba, low_balance_warning)
ba.balance = 5000
print "Balance is %s" % ba.balance
ba.balance = 99
print "Balance is %s" % ba.balance
Balance is 5000
You are now poor
Balance is 99