Automatic JSON-formatted dictionaries in IPython/Jupyter

By Stefaan Lippens on 2023/04/05

IPython/Jupyter notebooks have built-in "pretty" formatting of dictionary (and related) constructs. For example, take this messy, nested dictionary construct:

Read more →

Programmatically creating clients and users in Keycloak

By Stefaan Lippens on 2023/04/04

How to create clients and users programmatically in Keycloak, using Python.

Read more →

git-annex and Homebrew version woes (or how to spend your evenings with open source software)

By Stefaan Lippens on 2022/06/26

The other night I wanted to upload some photo's to my git-annex remote on our home NAS (a Synology DS416, let's call it mallorca), but reality decided otherwise:

$ git annex copy --to mallorca path/to/nice-picture-01.jpg
fatal: Run with no arguments or with -c cmd
git-annex-shell: git-shell failed
(unable …

Read more →

Bunch o' cheat sheets

By Stefaan Lippens on 2022/04/22

You're probably very familiar with the tools you use daily and operate them from muscle memory. But there are also these setup or maintenance tasks you only do every X months and their practical details are a bit hazy.

This is a random, work-in-progress collection of cheat sheets for these …

Read more →

Disable pytest's log/print capturing

By Stefaan Lippens on 2020/03/03

Yet another note to self.

You're working on some unit tests in pytest and its default log/print capturing is getting a bit in the way. You want to see print or logging calls immediately when they happen and not in some captured/delayed fashion.

Add these command line options …

Read more →

Yet another solution to dig you out of a circular import hole in Python

By Stefaan Lippens on 2019/09/27

The circular import problem in Python. Some module foo imports module bar, but bar also imports foo. On itself, it's not necessarily a problem. Python allows it. Depending on how both modules interact, you might not even notice there is cycle in the dependency chain.

However, when you have a …

Read more →

Step by Step OAuth 2.0 Authorization Code Flow with PKCE

By Stefaan Lippens on 2019/09/24

In this notebook, I will dive into the OAuth 2.0 Authorization Code flow with PKCE step by step in Python, using a local Keycloak setup as authorization provider. The focus lies on practical, step by step low-level HTTP operations. We wont even use an actual browser nor need an actual HTTP server for the redirect URL.

Read more →

Header duplication in Spark partitioned CSV files

By Stefaan Lippens on 2019/03/19

You are writing a Spark DataFrame to a CSV file with header line on HDFS.

df.write.csv('output_folder', header=True)

Because your DataFrame is partitioned, you get multiple CSV files in your output folder. Each file will get a header with column names.

So far so good, but …

Read more →

HiveServer2 User Impersonation Issues

By Stefaan Lippens on 2019/02/16

While setting up Apache Hive, HiveServer2 and Beeline (using vanilla packages instead of some kind of prepackaged Hadoop distribution), I struggled with some permission/user related problems. The error message I got stuck with was something like this:

org.apache.hadoop.security.authorize.AuthorizationException
User: hive is not allowed to …

Read more →

"Native-Hadoop" Library Load Issues with Spark

By Stefaan Lippens on 2019/02/05

While setting up a new cluster with Hadoop (3.1.1) and Spark (2.4.0), I encountered these warnings when running spark:

19/02/05 13:06:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

To debug this issue, I used …

Read more →

Stefaan Lippens inserts content here

yet another weblog

Automatic JSON-formatted dictionaries in IPython/Jupyter

Programmatically creating clients and users in Keycloak

git-annex and Homebrew version woes (or how to spend your evenings with open source software)

Bunch o' cheat sheets

Disable pytest's log/print capturing

Yet another solution to dig you out of a circular import hole in Python

Step by Step OAuth 2.0 Authorization Code Flow with PKCE

Header duplication in Spark partitioned CSV files

HiveServer2 User Impersonation Issues

"Native-Hadoop" Library Load Issues with Spark