Python IO and Serialisation Snippets
numpy, pathlib and combined json encoders
Being able to serialize arbitrary objects to json can be very handy, for exchanging data, but also for logging or debugging. However, for unknown object types, you’ll get a
TypeError: Object of type ... is not JSON serializable
when trying to serialize them. This error indicates that the
json.JSONEncoder
needs some help to
figure out what to do. Luckily you can supply your own handlers to deal with these
exceptions.
I’ve seen solutions on StackOverflow, which either
create a new encoder by inheriting from json.JSONEncoder
or
overwrite the default function of the json.JSONEncoder
.
Both solutions work fine, but, by accident, I found a more pythonic approach in the
official code
(no clue why the inheritance and not the functional approach is in the
official docs). You can supply a custom
encoding function (aka default
) to the dump
and load
functions of the json module.
I picked numpy and Path
from pathlib as examples.
Encoding numpy arrays
To deal with numpy arrays, you convert it to a list before feeding it to the
JSONEncoder
.
import numpy as np
import json
def encode_np_array(obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
Supply the method as default
argument to e.g. json.dumps
:
data = {"some_key": np.array([1, 2, 3])}
encoded = json.dumps(data, default=encode_np_array)
Encoding Path objects
Analog to the numpy example, you can also serialize Path
objects:
from pathlib import Path
def encode_path(obj):
if isinstance(obj, Path):
return str(obj)
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
Also here, supply the method as default
argument to e.g. json.dumps
:
data = {"some_key": Path("/usr/bin/python3")}
encoded = json.dumps(data, default=encode_path)
Combine encoders
It gets a bit more tricky when you want to combine several encoders. I chose to create a
higher-order function (in my
case: a function that returns a function). combine_encoders
gets a list of encoder
functions and returns a new function that combines the encoders in a for
loop.
def combine_encoders(*encs):
def combined(obj):
for enc in encs:
try:
return enc(obj)
except TypeError:
pass
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
return combined
Using this construct is straight forward:
data = {"path": Path("/usr/bin/python3"), "numpy": np.array([1, 2, 3])}
combined_encoders = combine_encoders(encode_np_array, encode_path)
encoded = json.dumps(data, default=combined_encoders)
Parse YAML dict, keeping duplicates
from collections import Counter
import yaml
def parse_preserving_duplicates(src):
class PreserveDuplicatesLoader(yaml.loader.Loader):
pass
def map_constructor(loader, node, deep=False):
keys = [loader.construct_object(node, deep=deep) for node, _ in node.value]
vals = [loader.construct_object(node, deep=deep) for _, node in node.value]
key_count = Counter(keys)
data = {}
for key, val in zip(keys, vals):
if key_count[key] > 1:
if key not in data:
data[key] = []
data[key].append(val)
else:
data[key] = val
return data
PreserveDuplicatesLoader.add_constructor(
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, map_constructor
)
return yaml.load(src, PreserveDuplicatesLoader)
Dedent text
from textwrap import dedent
import yaml
data = """\
- Hesperiidae
- Papilionidae
- Apatelodidae
- Epiplemidae
"""
yaml.save_load(dedent(data))