Publications de Olivier Mallassi

Archi & techno

Scribe installation

Scribe installation is a little bit tricky (I need to precise I am not what we can call a C++ compilation expert and thanks to David for his help...). Here is so how I installed Scribe on my Ubuntu (Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010)

Lire la suite
Archi & techno

HDFS, Hadoop & co…

Le monde noSQL est riche. Hadoop est un des éléments qui le compose. "Globalement" un clone du Google Big Table et utilisant l'algorithme Map/Reduce, ce projet Apache est en fait composé de plusieurs sous-projets (HBase,Zookeeper....). Vous me direz que depuis Google a changé son fusil d'épaule avec Big Query. Bref... Ces articles (je l'espère complétés par d'autres) expliquent plus en détails les éléments de base concernant HDFS et Hadoop. HDFS est un système de fichiers distribué, ie. réparti sur plusieurs machines physiques. Ce système de…

Lire la suite
Archi & techno

How to “crunch” your data stored in HDFS?

HDFS stores huge amount of data but storing it is worthless if you cannot analyse it and obtain information. Option #1 : Hadoop : the Map/Reduce engine Hadoop Overview Hadoop is a Map/Reduce framework that works on HDFS or on HBase. The main idea is to decompose a job into several and identical tasks that can be executed closer to the data (on the DataNode). In addition, each task is parallelized : the Map phase. Then all these intermediate results are merged into one result…

Lire la suite
Archi & techno

Hadoop Distributed File System : Overview & Configuration

Hadoop Distributed File System can be considered as a standard file system butt it is distributed. So from the client point of view, he sees a standard file system (the one he can have on your laptop) but behind this, the file system actually runs on several machines. Thus, HDFS implements fail-over using data replication and has been designed to manipulate, store large data sets (in large file) in a write-one-read-many access model for files.

Lire la suite
Archi & techno

Event Sourcing & noSQL

I saw the talks of Greg Young about CQRS & especially “Event Sourcing” a couple of times and each time, I really really tell myself this pattern is just “génial” (the way we say it in french) even if Martin Fowler wrote about it in 2005 and deals in details with implementation concerns and issues (especially in the cases of integration with external systems). Event Sourcing : stop thinking of your datas as a stock but rather as a list of events...

Lire la suite
Évènement

Teradata & Cloudera : partenariat autour de Data Warehousing et de Hadoop

Nous essayons, en général, de ne pas nous contenter de relayer l'information. Reste qu'il est des news qui sont surprenantes, surtout dans des contextes innovants comme noSQL. Teradata & Cloudera s'associeraient et proposeraient une intégration entre Teradata et la distribution Hadoop de Cloudera Parallel processing frameworks, such as Hadoop, have a natural affinity to parallel data warehouses, such as the powerful Teradata analytical database engine. Although designed for very different types of data exploration, together the two approaches can be more valuable in mining massive…

Lire la suite
Archi & techno

DevOps : le mouvement qui tend à “Agilifier” votre DSI

La communauté "DevOps" nous invite à repenser la frontière classique de nos organisation, séparant d'un côté les études, i.e. ceux qui écrivent le code (le “Build”) et de l'autre côté la production, i.e. ceux qui déploient et exploitent ces applications (le “Run”). 2 groupes se retrouvent dans le mouvement DevOps et apportent un peu de fraicheur dans ces réflexions aussi anciennes que les DSIs : les agilistes qui ont levé la "contrainte" côté développement, et sont maintenant capable de "livrer" beaucoup plus souvent du logiciel…

Lire la suite
Archi & techno

Jouons avec Cassandra… (3/3)

Cette partie se concentre sur la partie client et présente des exemples de code Java permettant de manipuler les concepts métiers définis dans la partie précédente. Bien que les APIs Cassandra soient disponibles dans plusieurs langages, concentrerons nous sur l’API Java.

Lire la suite
Archi & techno

Let’s play with Cassandra… (Part 3/3)

In this part, we will see a lot of Java code (the API exists in several other languages) and look at the client part of Cassandra. Use Case #0: Open and close a connection to any node of your Cluster Cassandra is now accessed using Thrift. The following code opens a connection to the specified node. TTransport tr = new TSocket("192.168.216.128", 9160); TProtocol proto = new TBinaryProtocol(tr); tr.open(); Cassandra.Client cassandraClient = new Cassandra.Client(proto); ... tr.close(); As I told previously, the default API does not provide…

Lire la suite
Archi & techno

This is the story of a project…

This is the story of a project, neither more complex nor simpler than others: an application that communicates with a database and two other systems. Something quite mainstream from a technical and architectural side, something standard from the management side: all must be done for yesterday and there is a lot to do…In short, “it’s gonna be hard” as often say the developers but nobody screams it out too loud. So we build the team. 40 persons are staffed, people are specialized. The teams are…

Lire la suite