"Native-Hadoop" Library Load Issues with Spark - Stefaan Lippens inserts content here

While setting up a new cluster with Hadoop (3.1.1) and Spark (2.4.0), I encountered these warnings when running spark:

19/02/05 13:06:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

To debug this issue, I used a useful trick explained on stackoverflow. Instead of tweaking the Hadoop logging config (usually /etc/hadoop/conf/log4j.properties) I (temporarily) added this line to the Spark logging config (usually /etc/spark/conf/log4j.properties):

log4j.logger.org.apache.hadoop.util.NativeCodeLoader=DEBUG

And then I got bit more info about the warning:

19/02/05 13:20:52 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library...
19/02/05 13:20:52 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
19/02/05 13:20:52 DEBUG NativeCodeLoader: java.library.path=:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
19/02/05 13:20:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

The "native-hadoop" library files are at /opt/hadoop/lib/native/ in my case, but that's not part of Spark's java.library.path. Apparently (see SPARK-1720, MAPREDUCE-4072) it should be enough to define an appropriate LD_LIBRARY_PATH environment variable (instead of java.library.path hacking).

So I added

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native

to /etc/spark/conf/spark-env.sh and the NativeCodeLoader warning went away.