Scribe installation

le 08/01/2011 par Olivier Mallassi

Scribe installation is a little bit tricky (I need to precise I am not what we can call a C++ compilation expert and thanks to David for his help...). Here is so how I installed Scribe on my Ubuntu (Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010)

Scribe Compilation : get the basic packages...

To compile Scribe, you need a couple of dependencies. As far as I remember, I needed (via apt-get install) the following dependencies (the first ones are certainly already part of your distribution) :

libtool
automake
autoconf
g++
make
libboost-dev=1.38.1 and (or?)libboost-all-dev 
flex
bison
pkg-config
build-essential
mono-gmcs
libevent-dev
python
python-dev

...compile the thrift and fb303...

You first need to compile scribe dependencies : thrift and fb303 which is part of the thrift distribution.

Get the source code from the repository

svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift

Then in the folder ./thrift

./bootstrap.sh && ./configure && make && sudo make install

Then in the folder ./thrift/contrib/fb303/

./bootstrap.sh && sudo make && sudo make install

...and then compile Scribe.

You can compile Scribe

get the source from git repository http://github.com/facebook/scribe.git or download version 2.2 from here

run

./bootstrap, configure, make, sudo make install

Scribe should work...

Scribe compilation with HDFS support : did you forget the option, didn’t you?

If you want to use Scribe to log data into HDFS, you need to compile Scribe with --enable-hdfs option. How will you detect it? try to use HDFS with the default compilation of Scribe and you will get one of the cleanest log message I have ever seen :)

Thus, you have to recompile Scribe and the main challenge will be to find the compatible version of Scribe and Hadoop...The following compilation has been made with the Apache distribution of Hadoop-0.21.0 and Scribe 2.2. Both $HADOOP_HOME and $JAVA_HOME must be set (I used jdk1.6.0_16)

From your Scribe installation folder :

./bootstrap.sh --enable-hdfs
./configure --with-hadooppath=$HADOOP_HOME --enable-hdfs CPPFLAGS="-I$HADOOP_HOME/hdfs/src/c++/libhdfs/ -I$JAVA_HOME/include/ -I$JAVA_HOME/include/linux/" LDFLAGS="-ljvm -lhdfs -L$JAVA_HOME/jre/lib/i386/client -L$HADOOP_HOME/c++/Linux-i386-32/lib/"
make

The make should fail. So I needed to modify the ./src/HdfsFile.cpp (last time I wrote C++ was a very long time ago...)

modify the method deleteFile. It contains a call to hdfsDelete but the third parameter is missing. So you need to change the line to this one : hdfsDelete(fileSys, filename.c_str(), 1);
You also need to install the following patch if you want Scribe to be able to write on the HDFS.

run make again, then sudo make install

Scribe Configuration : the easy part

In order to make Scribe work, you need a couple of configuration. First you need to set $JAVA_HOME and $HADOOP_HOME (I tested with jdk 1.6.0_16 and Hadoop in version 0.21.0) Then export LD_LIBRARY_PATH=/$JAVA_HOME/jre/lib/i386/client:/$HADOOP_HOME/c++/Linux-i386-32/lib or you will have this kind of error scribed: error while loading shared libraries: libjvm.so: cannot open shared object file

And, in order to write data directly on HDFS, you need to set your CLASSPATH variable

export  CLASSPATH=$HADOOP_HOME/hadoop-hdfs-0.21.0.jar:$HADOOP_HOME/hadoop-common-0.21.0.jar:$HADOOP_HOME/lib/commons-logging-1.1.1.jar