While setting up a new cluster with Hadoop (3.1.1) and Spark (2.4.0), I encountered these warnings when running spark:
19/02/05 13:06:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
To debug this issue, I used a useful trick explained on stackoverflow.
Instead of tweaking the Hadoop logging config (usually /etc/hadoop/conf/log4j.properties
)
I (temporarily) added this line to the Spark logging config (usually /etc/spark/conf/log4j.properties
):
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=DEBUG
And then I got bit more info about the warning:
19/02/05 13:20:52 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library...
19/02/05 13:20:52 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
19/02/05 13:20:52 DEBUG NativeCodeLoader: java.library.path=:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
19/02/05 13:20:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The "native-hadoop" library files are at /opt/hadoop/lib/native/
in my case,
but that's not part of Spark's java.library.path
.
Apparently (see SPARK-1720,
MAPREDUCE-4072) it should
be enough to define an appropriate LD_LIBRARY_PATH
environment variable
(instead of java.library.path
hacking).
So I added
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
to /etc/spark/conf/spark-env.sh
and the NativeCodeLoader
warning went away.