Scribe installation
Scribe installation is a little bit tricky (I need to precise I am not what we can call a C++ compilation expert and thanks to David for his help...). Here is so how I installed Scribe on my Ubuntu (Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010)
Scribe Compilation : get the basic packages...
To compile Scribe, you need a couple of dependencies. As far as I remember, I needed (via apt-get install) the following dependencies (the first ones are certainly already part of your distribution) :
libtool
automake
autoconf
g++
make
libboost-dev=1.38.1 and (or?)libboost-all-dev
flex
bison
pkg-config
build-essential
mono-gmcs
libevent-dev
python
python-dev
...compile the thrift and fb303...
You first need to compile scribe dependencies : thrift and fb303 which is part of the thrift distribution.
Get the source code from the repository
svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift
Then in the folder ./thrift
./bootstrap.sh && ./configure && make && sudo make install
Then in the folder ./thrift/contrib/fb303/
./bootstrap.sh && sudo make && sudo make install
...and then compile Scribe.
You can compile Scribe
get the source from git repository http://github.com/facebook/scribe.git or download version 2.2 from here
run
./bootstrap, configure, make, sudo make install
Scribe should work...
Scribe compilation with HDFS support : did you forget the option, didn’t you?
If you want to use Scribe to log data into HDFS, you need to compile Scribe with --enable-hdfs
option. How will you detect it? try to use HDFS with the default compilation of Scribe and you will get one of the cleanest log message I have ever seen :)
Thus, you have to recompile Scribe and the main challenge will be to find the compatible version of Scribe and Hadoop...The following compilation has been made with the Apache distribution of Hadoop-0.21.0 and Scribe 2.2. Both $HADOOP_HOME
and $JAVA_HOME
must be set (I used jdk1.6.0_16)
From your Scribe installation folder :
./bootstrap.sh --enable-hdfs
./configure --with-hadooppath=$HADOOP_HOME --enable-hdfs CPPFLAGS="-I$HADOOP_HOME/hdfs/src/c++/libhdfs/ -I$JAVA_HOME/include/ -I$JAVA_HOME/include/linux/" LDFLAGS="-ljvm -lhdfs -L$JAVA_HOME/jre/lib/i386/client -L$HADOOP_HOME/c++/Linux-i386-32/lib/"
make
The make should fail. So I needed to modify the ./src/HdfsFile.cpp
(last time I wrote C++ was a very long time ago...)
- modify the method
deleteFile
. It contains a call tohdfsDelete
but the third parameter is missing. So you need to change the line to this one :hdfsDelete(fileSys, filename.c_str(), 1);
- You also need to install the following patch if you want Scribe to be able to write on the HDFS.
run make
again, then sudo make install
Scribe Configuration : the easy part
In order to make Scribe work, you need a couple of configuration. First you need to set $JAVA_HOME
and $HADOOP_HOME
(I tested with jdk 1.6.0_16 and Hadoop in version 0.21.0) Then export LD_LIBRARY_PATH=/$JAVA_HOME/jre/lib/i386/client:/$HADOOP_HOME/c++/Linux-i386-32/lib
or you will have this kind of error scribed: error while loading shared libraries: libjvm.so: cannot open shared object file
And, in order to write data directly on HDFS, you need to set your CLASSPATH
variable
export CLASSPATH=$HADOOP_HOME/hadoop-hdfs-0.21.0.jar:$HADOOP_HOME/hadoop-common-0.21.0.jar:$HADOOP_HOME/lib/commons-logging-1.1.1.jar