Scribe

Archi & techno

Scribe, Chukwa…des collecteurs permettant d’alimenter le HDFS

HDFS, que nous avons déjà évoqué, reste un système de fichier distribué et il faut donc l'alimenter. Il y existe plusieurs options : à la manière batch. La première option est donc de continuer à collecter les données sur un système de fichier local et de les importer sur le HDFS par vacation. La seconde option serait d'utiliser un ETL. Pentaho a annoncé le support de Hadoop pour sa solution Data Integration Product. Les premiers tests que l'on a réalisé nous montre que cela fonctionne…

Lire la suite
Archi & Techno

Scribe : a way to aggregate data and why not, to directly fill the HDFS?

HDFS is a distributed file system and quickly raise an issue : how to fill this file system with all my data? There are several options that go from batch import to Straight Through Processing. Bulk load style. The first one is to keep collecting data on local file system and importing them by vacation. The second one is to use an ETL. Pentaho has announced support of Hadoop for Data Integration product. The first tests we conducted lead us to think this works much…

Lire la suite
Archi & Techno

Scribe installation

Scribe installation is a little bit tricky (I need to precise I am not what we can call a C++ compilation expert and thanks to David for his help...). Here is so how I installed Scribe on my Ubuntu (Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010)

Lire la suite