Python: defining a class and pickling an instance in the same file

3 December, 2005 - 13:12
Categories:

The python pickle module is an interesting and helpful module. It offers an easy way to save an load your own python datastructures (classes) without having to design your own file format and implement import and export procedures. This text is about what can happen if you try to pickle an instance of a class you defined in the same (script) file. A scenario in which this could happen: you have a module defining some classes with save and load functionality through the pickle module and beside being able to import that module in other python scripts, you also want to be able to run it as a standalone script. (As a sidenote it could be important to mention that python version 2.4.2 is used for these "experiments".)

Consider the following python module and python script. File pat.py is a python module defining the class "Pat":

print "defining the class Pat"
class Pat:
    def __init__(self, x):
        self.data = "I'm Pat '%d'" % x

File saver.py is a python script that defines a class "Puis" and pickles an instance of Pat and an instance of Puis:

#!/usr/bin/python
 
import pickle
 
print "defining the class Puis"
class Puis:
    def __init__(self, x):
        self.data = "I'm Puis '%d'" % x
 
# code in case of standalone usage
if __name__ == "__main__"
    from pat import *
    my_pat  = Pat(1234)
    my_puis = Puis(5678)
    pickle.dump(my_pat, file('my_pat.pickle','w'))
    pickle.dump(my_puis, file('my_puis.pickle','w'))

After running saver.py we have two pickle files my_pat.pickle and my_puis.pickle containing the instances in our working directory. When we try to load the instances with another script loader.py:

#!/usr/bin/python
 
import pickle
 
my_pat = pickle.load(file('my_pat.pickle'))
print my_pat.data
 
my_puis = pickle.load(file('my_puis.pickle'))
print my_puis.data

we get the following output:

$> ./loader.py
defining the class Pat
I'm Pat '1234'
Traceback (most recent call last):
  File "./loader.py", line 8, in ?
    my_puis = pickle.load(file('my_puis.pickle'))
  File "/usr/lib/python2.4/pickle.py", line 1390, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.4/pickle.py", line 872, in load
    dispatch[key](self)
  File "/usr/lib/python2.4/pickle.py", line 1083, in load_inst
    klass = self.find_class(module, name)
  File "/usr/lib/python2.4/pickle.py", line 1140, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Puis'

Apparently the loading of my_pat.pickle was successful, but loading my_puis.pickle failed. The reason for this is that unpickling an instance involves defining its class by loading the appropriate module (if this module is not already loaded). Look for example at my_pat.pickle:

(ipat
Pat
p0
(dp1
S'data'
p2
S"I'm Pat '1234'"
p3
sb.

The first line indicates that module "pat" should be loaded for unpickling this file. But look at my_puis.pickle:

(i__main__
Puis
p0
(dp1
S'data'
p2
S"I'm Puis '5678'"
p3
sb.

The saved module indication is "__main__", refering to the top level scope of saver.py. But during the execution of loader.py "__main__" refers to the top level of loader.py, in which no class "Puis" is defined. This results in raising the AttributeError.

Two possible solutions to this problem:

  • import the class "Puis" at the toplevel of loader.py:
    #!/usr/bin/python
     
    import pickle
    from saver import Puis  ### look here ###
     
    my_pat = pickle.load(file('my_pat.pickle'))
    print my_pat.data
     
    my_puis = pickle.load(file('my_puis.pickle'))
    print my_puis.data

    when running this, my_puis.pickle will be loaded successfuly:

    $> ./loader.py
    defining the class Pat
    defining the class Puis
    I'm Pat '1234'
    I'm Puis '5678'

    The drawback of this approach is that the loader is responsible for knowing and loading the right module, while it would be better if the saver offered the right information in the file itself.

  • Force the encoding of the right module name in the pickle file. I don't know if this is a descend way to solve the problem. It works in a way, but the problem with subclassed classes (see below) indicates that there might be something wrong about it.

    It can be done by explicitly setting the "__module__" attribute of the class "Puis" to the "saver" in saver.py:

    #!/usr/bin/python
     
    import os
    import pickle
     
    print "defining the class Puis"
    class Puis:
        __module__ = os.path.splitext(os.path.basename(__file__))[0]  ### look here ###
        def __init__(self, x):
            self.data = "I'm Puis '%d'" % x
     
    if __name__=="__main__":
        from pat import *
        my_pat  = Pat(1234)
        my_puis = Puis(5678)
     
        pickle.dump(my_pat, file('my_pat.pickle','w'))
        pickle.dump(my_puis, file('my_puis.pickle','w'))

    now my_puis.pickle refers to the right module ("saver"):

    (isaver
    Puis
    p0
    (dp1
    S'data'
    p2
    S"I'm Puis '5678'"
    p3
    sb.

    The drawback of this approach is that it could fail if you want to do this on on subclasses of builtin classes. The following variation of saver.py defines the class "Piem" as a subclass of the builtin class list:

    #!/usr/bin/python
     
    import os
    import pickle
     
    print "defining the class Piem"
    class Piem(list):  ### look here ###
         __module__ = os.path.splitext(os.path.basename(__file__))[0]
        def __init__(self, x):
            self.data = "I'm Piem '%d'" % x
     
    if __name__ == "__main__":
        my_piem = Piem(9012)
        pickle.dump(my_piem, file('my_piem.pickle','w'))

    running this script will generate the following:

    defining the class Piem
    defining the class Piem
    Traceback (most recent call last):
      File "./saver.py", line 28, in ?
        pickle.dump(my_piem, file('my_piem.pickle','w'))
      File "/usr/lib/python2.4/pickle.py", line 1382, in dump
        Pickler(file, protocol, bin).dump(obj)
      File "/usr/lib/python2.4/pickle.py", line 231, in dump
        self.save(obj)
      File "/usr/lib/python2.4/pickle.py", line 338, in save
        self.save_reduce(obj=obj, *rv)
      File "/usr/lib/python2.4/pickle.py", line 415, in save_reduce
        save(args)
      File "/usr/lib/python2.4/pickle.py", line 293, in save
        f(self, obj) # Call unbound method with explicit self
      File "/usr/lib/python2.4/pickle.py", line 576, in save_tuple
        save(element)
      File "/usr/lib/python2.4/pickle.py", line 293, in save
        f(self, obj) # Call unbound method with explicit self
      File "/usr/lib/python2.4/pickle.py", line 765, in save_global
        raise PicklingError(
    pickle.PicklingError: Can't pickle <class 'saver.Piem'>: it's not the same object as saver.Piem

    Beside the PicklingError you can observe that the class is defined twice (first two lines of the output are the same). Apparently the pickling involves importing saver.py as a module in this case.