“最小的惊讶”和可变的默认论据

任何人用Python修补足够长的时间都被以下问题困扰(或被撕碎):

def foo(a=[]): a.append(5) return a 

Python新手会期望这个函数总是返回一个只有一个元素的列表: [5] 。 结果是非常不同的,而且非常惊人(对于新手来说):

 >>> foo() [5] >>> foo() [5, 5] >>> foo() [5, 5, 5] >>> foo() [5, 5, 5, 5] >>> foo() 

我的一位经理曾经第一次遇到这个function,并称这是一个“戏剧性的devise缺陷”。 我回答说,这个行为有一个基本的解释,如果你不了解内部,那确实是非常令人费解和意外的。 然而,我无法回答(对我自己)以下问题:在函数定义处绑定默认参数的原因是什么,而不是在函数执行处? 我怀疑经验丰富的行为是否有实际用途(谁真的在C中使用静态variables,而没有繁殖错误?)

编辑

Baczek做了一个有趣的例子。 再加上你的大部分意见,尤其是Utaal的意见,我进一步阐述了:

 >>> def a(): ... print("a executed") ... return [] ... >>> >>> def b(x=a()): ... x.append(5) ... print(x) ... a executed >>> b() [5] >>> b() [5, 5] 

对我来说,devise决定似乎是相对于把参数的范围放在哪里:在函数里面还是在一起呢?

在函数内部进行绑定意味着当函数被调用时, x被有效地绑定到指定的默认值,而不是被定义的,这将会带来很深的缺陷: def线将是“混合”的,因为绑定的一部分(函数对象)将在定义时发生,在函数调用时会发生部分(默认参数的分配)。

实际的行为更加一致:当执行该行时,该行的所有内容都被评估,这意味着在函数定义处。

其实这不是devise上的缺陷,也不是因为内部的原因,或者是performance。
简单来说,Python中的函数是一stream的对象,而不仅仅是一段代码。

只要你这样想,那么它是完全有道理的:一个函数是一个被定义的对象; 默认参数是一种“成员数据”,因此它们的状态可能会从一个调用转换到另一个调用 – 就像其他任何对象一样。

无论如何,Effbot 在Python的默认参数值中对这种行为的原因有一个非常好的解释。
我发现它非常清楚,我真的build议阅读它以更好地了解函数对象是如何工作的。

假设你有下面的代码

 fruits = ("apples", "bananas", "loganberries") def eat(food=fruits): ... 

当我看到吃饭的声明时,最令人惊讶的是认为如果没有给出第一个参数,它将等于元组("apples", "bananas", "loganberries")

不过,后来在代码中,我会这样做

 def some_random_function(): global fruits fruits = ("blueberries", "mangos") 

那么如果默认参数在函数执行而不是函数声明中被绑定,那么我会惊奇地发现水果已经被改变了。 这将是更令人惊讶的国际海事组织比发现你上面的foofunction是突变名单。

真正的问题在于可变variables,所有的语言在一定程度上都有这个问题。 这里有一个问题:假设在Java中我有以下代码:

 StringBuffer s = new StringBuffer("Hello World!"); Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>(); counts.put(s, 5); s.append("!!!!"); System.out.println( counts.get(s) ); // does this work? 

现在,我的地图是使用StringBuffer键的值,当它被放置到地图中,或者它通过引用存储的关键? 无论哪种方式,有人惊讶, 要么使用与他们所使用的值相同的值来获取对象的人或者即使他们正在使用的密钥也不能检索其对象的人是相同的对象被用来放入地图(这实际上是为什么Python不允许使用其可变内置数据types作为字典键)。

你的例子是Python新手会感到惊讶和困扰的一个很好的例子。 但是我认为,如果我们“固定”了这一点,那么只会造成一种不同的情况,他们会被咬,而那样会更不直观。 而且,处理可变variables时总是这样; 你总是遇到这样的情况,根据他们正在编写的代码,有人可以直观地预期一种或相反的行为。

我个人喜欢Python的当前方法:默认函数参数在函数定义时被评估,并且该对象总是默认的。 我想他们可以使用一个空的列表来进行特殊的处理,但是那种特殊的shell会引起更多的惊讶,更不用说倒退了。

AFAICS还没有人发布文档的相关部分:

当函数定义被执行时,默认参数值被评估。 这意味着expression式被计算一次,当函数被定义时,并且每个调用使用相同的“预先计算”值。 当一个默认参数是一个可变对象,例如一个列表或一个字典时,这是特别重要的:如果函数修改了对象(例如通过将一个项目附加到一个列表),默认值就会被修改。 这通常不是打算的。 解决这个问题的方法是使用None作为默认值,并在函数体中明确地testing它[…]

我对Python解释器内部工作一无所知(而且我也不是编译器和解释器方面的专家),所以如果我提出任何不可知的或不可能的东西,不要责怪我。

假设python对象是可变的,我认为在devise默认参数的时候应该考虑这个问题。 当你实例化一个列表时:

 a = [] 

你希望得到一个引用的列表。

为什么要a = [] in

 def x(a=[]): 

在函数定义上实例化一个新的列表,而不是在调用? 这就像你问“如果用户不提供参数,然后实例化一个新的列表,并使用它,就好像它是由调用者产生的”。 我认为这是暧昧的:

 def x(a=datetime.datetime.now()): 

用户,你想要一个默认的date时间对应于当你定义或执行x ? 在这种情况下,和上一个一样,我将保持相同的行为,就像默认参数“assignment”是函数的第一条指令(在函数调用上调用datetime.now())一样。 另一方面,如果用户想要定义时间映射,他可以写:

 b = datetime.datetime.now() def x(a=b): 

我知道,我知道:那是封闭的。 或者,Python可能会提供一个关键字来强制定义时间绑定:

 def x(static a=b): 

嗯,原因很简单,绑定是在执行代码的时候完成的,并且函数定义是被执行的,当函数被定义的时候也是如此。

比较一下:

 class BananaBunch: bananas = [] def addBanana(self, banana): self.bananas.append(banana) 

这段代码患有完全相同的意外事件。 香蕉是一个类的属性,因此,当你添加的东西,它被添加到该类的所有实例。 原因完全一样。

这只是“如何工作”,并在function情况下使其工作不同,可能会是复杂的,在类的情况下可能是不可能的,或者至less减缓对象的实例化,因为你必须保持类代码并在创build对象时执行它。

是的,这是意想不到的。 但是一旦下降,它就完全符合Python的工作原理。 事实上,这是一个很好的教具,一旦你明白了为什么会发生这种情况,你会更好地python。

这就是说,它应该在任何优秀的Python教程中突出显示。 正如你所提到的,每个人迟早都会遇到这个问题。

我曾经认为在运行时创build对象将是更好的方法。 我现在不太确定,因为你确实失去了一些有用的function,尽pipe这可能是值得的,不pipe简单地防止新手混淆。 这样做的缺点是:

1.性能

 def foo(arg=something_expensive_to_compute())): ... 

如果使用呼叫时间评估,那么每次使用没有参数的函数时都会调用昂贵的函数。 每次调用都要付出昂贵的代价,或者需要在外部手动caching值,污染名称空间并添加详细信息。

2.强制绑定的参数

一个有用的技巧是在lambda创build时将lambda的参数绑定到variables的当前绑定。 例如:

 funcs = [ lambda i=i: i for i in range(10)] 

这将返回分别返回0,1,2,3 …的函数列表。 如果行为改变了,他们会把i绑定到i调用时间值,所以你会得到所有返回的函数列表9

唯一的方法就是用我的界限来创build一个更进一步的closures,即:

 def make_func(i): return lambda: i funcs = [make_func(i) for i in range(10)] 

3.反思

考虑下面的代码:

 def foo(a='test', b=100, c=[]): print a,b,c 

我们可以使用inspect模块获取有关参数和默认值的信息

 >>> inspect.getargspec(foo) (['a', 'b', 'c'], None, None, ('test', 100, [])) 

这些信息对于文档生成,元编程,修饰器等是非常有用的。

现在,假设默认行为可以被改变,以便这相当于:

 _undefined = object() # sentinel value def foo(a=_undefined, b=_undefined, c=_undefined) if a is _undefined: a='test' if b is _undefined: b=100 if c is _undefined: c=[] 

但是,我们已经失去了内省的能力,并且看到了默认的参数。 因为这些对象还没有被构造出来,所以如果没有实际的调用这个函数的话,我们就无法得到它们。 我们能做的最好的是存储源代码并将其作为string返回。

5点防守Python

  1. 简单 :从以下方面来看,这种行为很简单:大多数人只落入这个陷阱一次,而不是几次。

  2. 一致性 :Python 总是传递对象,而不是名字。 显然,默认参数是函数标题的一部分(不是函数体)。 因此,它应该在模块加载时(仅在模块加载时,除非嵌套),而不是在函数调用时进行评估。

  3. 有用性 :Frederik Lundh在“Python中的默认参数值”的解释中指出,当前行为对于高级编程可能非常有用。 (谨慎使用)

  4. 足够的文档 :在最基本的Python文档中,本教程将在“更多关于定义函数”一节的第一小节中大声宣布该问题为“重要警告 ” 。 警告甚至使用粗体,这在标题之外很less使用。 RTFM:阅读精美的手册。

  5. 元学习 :陷入陷阱实际上是一个非常有用的时刻(至less如果你是一个反思学习者),因为随后你会更好地理解上面的“一致性”这一点,这将会教你一大堆关于Python的知识。

这种行为很容易解释:

  1. 函数(类等)声明只执行一次,创build所有的默认值对象
  2. 一切都通过参考传递

所以:

 def x(a=0, b=[], c=[], d=0): a = a + 1 b = b + [1] c.append(1) print a, b, c 
  1. a不会更改 – 每个赋值调用将创build新的int对象 – 将打印新的对象
  2. b不会改变 – 新的数组是从默认值构build并打印的
  3. c更改 – 在相同的对象上执行操作 – 并将其打印

你为什么不反思?

我真的惊讶没有人执行了Python提供的深刻的内省( 23申请)可卡因。

给定一个简单的小函数func定义为:

 >>> def func(a = []): ... a.append(5) 

当Python遇到它时,首先要做的就是编译它,以便为这个函数创build一个code对象。 当这个编译步骤完成时, Python 评估 *,然后在函数对象本身中存储默认参数(这里是一个空的列表) 。 正如上面提到的答案:列表a现在可以被认为是函数func成员

所以,让我们做一些自省,一个前后来检查如何在函数对象内部扩展列表。 我正在使用Python 3.x ,因为Python 2同样适用(在Python 2中使用__defaults__func_defaults ;是的,两个名称是相同的)。

执行前的function:

 >>> def func(a = []): ... a.append(5) ... 

在Python执行这个定义之后,它将使用任何指定的缺省参数( a = []在这里)并将它们__defaults__到函数对象 (相关部分:Callables) 的__defaults__属性中 :

 >>> func.__defaults__ ([],) 

好的,就像预期的那样,一个空列表作为__defaults__的单个条目。

执行后function:

现在让我们执行这个函数:

 >>> func() 

现在,我们再看看那些__defaults__

 >>> func.__defaults__ ([5],) 

惊讶? 对象内的值改变! 对函数的连续调用现在将简单地附加到该embeddedlist对象:

 >>> func(); func(); func() >>> func.__defaults__ ([5, 5, 5, 5],) 

所以,你有它,这个“缺陷”发生的原因是因为默认参数是函数对象的一部分。 这里没有什么奇怪的事情,这只是有点令人惊讶。

解决这个问题的常见解决scheme是将None为默认值,然后在函数体中进行初始化:

 def func(a = None): # or: a = [] if a is None else a if a is None: a = [] 

由于每次都要重新执行函数体,所以如果没有传入参数,总会得到一个新的空列表。


为了进一步validation__defaults__中的列表与函数func中使用的列表是否相同,您可以更改函数以返回函数体内使用的列表的id 。 然后,将它与__defaults____defaults__位置[0]中的列表进行比较,您将看到这些实际上是如何引用同一个列表实例的:

 >>> def func(a = []): ... a.append(5) ... return id(a) >>> >>> id(func.__defaults__[0]) == func() True 

所有的内省的力量!


*要validationPython在编译函数期间是否计算默认参数,请尝试执行以下操作:

 def bar(a=input('Did you just see me without calling the function?')): pass # use raw_input in Py2 

正如你会注意到的,在构build函数并将其绑定到名称bar的过程之前调用input()

你问的是为什么这样:

 def func(a=[], b = 2): pass 

在内部不等于这个:

 def func(a=None, b = None): a_default = lambda: [] b_default = lambda: 2 def actual_func(a=None, b=None): if a is None: a = a_default() if b is None: b = b_default() return actual_func func = func() 

除了显式调用func(None,None)的情况外,我们将忽略它。

换句话说,而不是评估默认参数,为什么不存储他们每个,并在函数调用时评估它们?

一个答案可能就在那里 – 它会有效地把每个带默认参数的函数变成一个闭包。 即使这些数据全部隐藏在解释器中,而不是完全封闭,数据也必须存储在某个地方。 它会更慢,并使用更多的内存。

1)所谓的“可变默认论证”的问题通常是一个特例,certificate:
“这个问题的所有function都受到实际参数上类似的副作用问题的困扰 ,”
这是违反函数式编程的规则,通常是不可估量的,应该一起解决。

例:

 def foo(a=[]): # the same problematic function a.append(5) return a >>> somevar = [1, 2] # an example without a default parameter >>> foo(somevar) [1, 2, 5] >>> somevar [1, 2, 5] # usually expected [1, 2] 

解决scheme副本
一个绝对安全的解决scheme是copydeepcopy copyinput对象,然后做任何与副本。

 def foo(a=[]): a = a[:] # a copy a.append(5) return a # or everything safe by one line: "return a + [5]" 

Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list) . Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying

Example problem for a similar SO question

 class Test(object): # the original problematic class def __init__(self, var1=[]): self._var1 = var1 somevar = [1, 2] # an example without a default parameter t1 = Test(somevar) t2 = Test(somevar) t1._var1.append([1]) print somevar # [1, 2, [1]] but usually expected [1, 2] print t2._var1 # [1, 2, [1]] but usually expected [1, 2] 

It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. ie _var1 is a private attribute )

结论:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.) .)

2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..

3) In some cases is the mutable behavior of default parameters useful .

This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.

 >>> def foo(a): a.append(5) print a >>> a = [5] >>> foo(a) [5, 5] >>> foo(a) [5, 5, 5] >>> foo(a) [5, 5, 5, 5] >>> foo(a) [5, 5, 5, 5, 5] 

No default values in sight in this code, but you get exactly the same problem.

The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5 ; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).

Your original foo , with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.

If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.

It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?

 def print_tuple(some_tuple=(1,2,3)): print some_tuple print_tuple() #1 print_tuple((1,2,3)) #2 

I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html ):

# 1

 0 LOAD_GLOBAL 0 (print_tuple) 3 CALL_FUNCTION 0 6 POP_TOP 7 LOAD_CONST 0 (None) 10 RETURN_VALUE 

# 2

  0 LOAD_GLOBAL 0 (print_tuple) 3 LOAD_CONST 4 ((1, 2, 3)) 6 CALL_FUNCTION 1 9 POP_TOP 10 LOAD_CONST 0 (None) 13 RETURN_VALUE 

I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.

This behavior is not surprising if you take the following into consideration:

  1. The behavior of read-only class attributes upon assignment attempts, and that
  2. Functions are objects (explained well in the accepted answer).

The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.

(1) is described in the Python tutorial on classes . In an attempt to assign a value to a read-only class attribute:

…all variables found outside of the innermost scope are read-only ( an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged ).

Look back to the original example and consider the above points:

 def foo(a=[]): a.append(5) return a 

Here foo is an object and a is an attribute of foo (available at foo.func_defs[0] ). Since a is a list, a is mutable and is thus a read-write attribute of foo . It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.

Calling foo without overriding a default uses that default's value from foo.func_defs . In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0] , which is part of the foo object and persists between execution of the code in foo .

Now, compare this to the example from the documentation on emulating the default argument behavior of other languages , such that the function signature defaults are used every time the function is executed:

 def foo(a, L=None): if L is None: L = [] L.append(a) return L 

Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:

  • When the foo function object is instantiated, foo.func_defs[0] is set to None , an immutable object.
  • When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] ( None ) is available in the local scope as L .
  • Upon L = [] , the assignment cannot succeed at foo.func_defs[0] , because that attribute is read-only.
  • Per (1) , a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo .

Already busy topic, but from what I read here, the following helped me realizing how it's working internally:

 def bar(a=[]): print id(a) a = a + [1] print id(a) return a >>> bar() 4484370232 4484524224 [1] >>> bar() 4484370232 4484524152 [1] >>> bar() 4484370232 # Never change, this is 'class property' of the function 4484523720 # Always a new object [1] >>> id(bar.func_defaults[0]) 4484370232 

A simple workaround using None

 >>> def bar(b, data=None): ... data = data or [] ... data.append(b) ... return data ... >>> bar(3) [3] >>> bar(3) [3] >>> bar(3) [3] >>> bar(3, [34]) [34, 3] >>> bar(3, [34]) [34, 3] 

The solutions here are:

  1. Use None as your default value (or a nonce object ), and switch on that to create your values at runtime; 要么
  2. Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).

The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type )

the shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. as a more contrived example, you may cite this:

 def a(): return [] def b(x=a()): print x 

hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.

i agree it's a gotcha when you try to use default constructors, though.

I sometimes exploit this behavior as an alternative to the following pattern:

 singleton = None def use_singleton(): global singleton if singleton is None: singleton = _make_singleton() return singleton.use_me() 

If singleton is only used by use_singleton , I like the following pattern as a replacement:

 # _make_singleton() is called only once when the def is executed def use_singleton(singleton=_make_singleton()): return singleton.use_me() 

I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.

Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.

I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).

As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.

Wrong Method (probably…) :

 def foo(list_arg=[5]): return list_arg a = foo() a.append(6) >>> a [5, 6] b = foo() b.append(7) # The value of 6 appended to variable 'a' is now part of the list held by 'b'. >>> b [5, 6, 7] # Although 'a' is expecting to receive 6 (the last element it appended to the list), # it actually receives the last element appended to the shared list. # It thus receives the value 7 previously appended by 'b'. >>> a.pop() 7 

You can verify that they are one and the same object by using id :

 >>> id(a) 5347866528 >>> id(b) 5347866528 

Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)

The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behaviour in the docstring.

This implementation ensures that each call to the function either receives the default list or else the list passed to the function.

Preferred Method :

 def foo(list_arg=None): """ :param list_arg: A list of input values. If none provided, used a list with a default value of 5. """ if not list_arg: list_arg = [5] return list_arg a = foo() a.append(6) >>> a [5, 6] b = foo() b.append(7) >>> b [5, 7] c = foo([10]) c.append(11) >>> c [10, 11] 

There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.

You can get round this by replacing the object (and therefore the tie with the scope):

 def foo(a=[]): a = list(a) a.append(5) return a 

Ugly, but it works.

When we do this:

 def foo(a=[]): ... 

… we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?

 def foo(a=pavlo): ... 

At any time, if the caller doesn't tell us what a is, we reuse pavlo .

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a .

So this is what you see (Remember, pavlo is initialized to []):

  >>> foo() [5] 

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

 >>> foo() [5, 5] 

Specifying a when calling foo() ensures pavlo is not touched.

 >>> ivan = [1, 2, 3, 4] >>> foo(a=ivan) [1, 2, 3, 4, 5] >>> ivan [1, 2, 3, 4, 5] 

So, pavlo is still [5, 5] .

 >>> foo() [5, 5, 5] 

It may be true that:

  1. Someone is using every language/library feature, and
  2. Switching the behavior here would be ill-advised, but

it is entirely consistent to hold to both of the features above and still make another point:

  1. It is a confusing feature and it is unfortunate in Python.

The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.

It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. 然而,

The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working–not because of a design fl–I mean, hidden logic puzzle–that cuts against the intuitions of programmers who are drawn to Python because it Just Works .

Python: The Mutable Default Argument

Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.

When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.

They stay mutated because they are the same object each time.

示范

Here's a demonstration – you can verify that they are the same object each time they are referenced by

  • seeing that the list is created before the function has finished compiling to a function object,
  • observing that the id is the same each time the list is referenced,
  • observing that the list stays changed when the function that uses it is called a second time,
  • observing the order in which the output is printed from the source (which I conveniently numbered for you):

example.py

 print('1. Global scope being evaluated') def create_list(): '''noisily create a list for usage as a kwarg''' l = [] print('3. list being created and returned, id: ' + str(id(l))) return l print('2. example_function about to be compiled to an object') def example_function(default_kwarg1=create_list()): print('appending "a" in default default_kwarg1') default_kwarg1.append("a") print('list with id: ' + str(id(default_kwarg1)) + ' - is now: ' + repr(default_kwarg1)) print('4. example_function compiled: ' + repr(example_function)) if __name__ == '__main__': print('5. calling example_function twice!:') example_function() example_function() 

and running it with python example.py :

 1. Global scope being evaluated 2. example_function about to be compiled to an object 3. list being created and returned, id: 140502758808032 4. example_function compiled: <function example_function at 0x7fc9590905f0> 5. calling example_function twice!: appending "a" in default default_kwarg1 list with id: 140502758808032 - is now: ['a'] appending "a" in default default_kwarg1 list with id: 140502758808032 - is now: ['a', 'a'] 

Does this violate the principle of "Least Astonishment"?

This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.

The usual instruction to new Python users:

But this is why the usual instruction to new users is to create their default arguments like this instead:

 def example_function_2(default_kwarg=None): if default_kwarg is None: default_kwarg = [] 

This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [] , as the default.

As the tutorial section on control flow says:

If you don't want the default to be shared between subsequent calls, you can write the function like this instead:

 def f(a, L=None): if L is None: L = [] L.append(a) return L 

This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)

I'm gonna give you what I see as a useful example.

 def example(errors=[]): # statements # Something went wrong mistake = True if mistake: tryToFixIt(errors) # Didn't work.. let's try again tryToFixItAnotherway(errors) # This time it worked return errors def tryToFixIt(err): err.append('Attempt to fix it') def tryToFixItAnotherway(err): err.append('Attempt to fix it by another way') def main(): for item in range(2): errors = example() print '\n'.join(errors) main() 

prints the following

 Attempt to fix it Attempt to fix it by another way Attempt to fix it Attempt to fix it by another way 

I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the "def" statement.

A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, ie, make a local copy of the value to a local variable, but pass object by reference, ie, pointers to the object.

Admitting the above two points, let's explain what happened to the python code. It's only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that "def" statement is executed only once when it is defined.

[] is an object, so python pass the reference of [] to a , ie, a is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1 . When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong. a is evaluated to be the list object, although now the content of the object is 1 . This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.

To further validate my answer, let's take a look at two additional codes.

====== No. 2 ========

 def foo(x, items=None): if items is None: items = [] items.append(x) return items foo(1) #return [1] foo(2) #return [2] foo(3) #return [3] 

[] is an object, so is None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it's there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], ie, points to another object which has a different address.

====== No. 3 =======

 def foo(x, items=[]): items.append(x) return items foo(1) # returns [1] foo(2,[]) # returns [2] foo(3) # returns [1,3] 

The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly, items has to take the address of this new [] , say 2222222, and return it after making some change. Now foo(3) is executed. since only x is provided, items has to take its default value again. What's the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make items [1,3].

From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:

 for i in range(10): def callback(): print "clicked button", i UI.Button("button %s" % i, callback) 

Each button can hold a distinct callback function which will display different value of i . I can provide an example to show this:

 x=[] for i in range(10): def callback(): print(i) x.append(callback) 

If we execute x[7]() we'll get 7 as expected, and x[9]() will gives 9, another value of i .

 >>> def a(): >>> print "a executed" >>> return [] >>> x =a() a executed >>> def b(m=[]): >>> m.append(5) >>> print m >>> b(x) [5] >>> b(x) [5, 5] 

This is not a design flaw . Anyone who trips over this is doing something wrong.

There are 3 cases I see where you might run into this problem:

  1. You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, eg cache={} , and you wouldn't be expected to call the function with an actual argument at all.
  2. You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
  3. You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.

The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.

Just change the function to be:

 def notastonishinganymore(a = [])'''The name is just a joke :)''': del a[:] a.append(5) return a 

build筑

Assigning default values in a function call is a code smell.

 def a(b=[]): pass 

This is a signature of a function that is up to no good. Not just because of the problems described by other answers. I won't go in to that here.

This function aims to do two things. Create a new list, and execute a functionality, most likely on said list.

Functions that do two things are bad functions, as we learn from clean code practices.

Attacking this problem with polymorphism, we would extend the python list or wrap one in a class, then perform our function upon it.

But wait you say, I like my one-liners.

Well, guess what. Code is more than just a way to control the behavior of hardware. It's a way of:

  • communicating with other developers, working on the same code.

  • being able to change the behavior of the hardware when new requirements arises.

  • being able to understand the flow of the program after you pick up the code again after two years to make the change mentioned above.

Don't leave time-bombs for yourself to pick up later.

Separating this function into the two things it does, we need a class

 class ListNeedsFives(object): def __init__(self, b=None): if b is None: b = [] self.b = b def foo(): self.b.append(5) 

Executed by

 a = ListNeedsFives() a.foo() ab 

And why is this better than mashing all the above code into a single function.

 def dontdothis(b=None): if b is None: b = [] b.append(5) return b 

Why not do this?

Unless you fail in your project, your code will live on. Most likely your function will be doing more than this. The proper way of making maintainable code is to separate code into atomic parts with a properly limited scope.

The constructor of a class is a very commonly recognized component to anyone who has done Object Oriented Programming. Placing the logic that handles the list instantiation in the constructor makes the cognitive load of understanding what the code does smaller.

The method foo() does not return the list, why not?

In returning a stand alone list, you could assume that it's safe to do what ever you feel like to it. But it may not be, since it is also shared by the object a . Forcing the user to refer to it as ab reminds them where the list belongs. Any new code that wants to modify ab will naturally be placed in the class, where it belongs.

the def dontdothis(b=None): signature function has none of these advantages.