Big Data

Big Data

Confluent.io: Part 3 – STREAM PROCESSING

This article is part of a series designed to demonstrate the setup and use of the Confluent Platform. In this series, our goal is to build an end to end data processing pipeline with Confluent. Disclaimer: While knowledge of Kafka internals is not required to understand this series, it can sometimes help clear out some parts of the articles. In the previous articles, we set up two topics, one to publish the input data coming from PostgreSQL and another one to push the data from…

Read more
Big Data

Confluent.io – Part 2: BUILD A STREAMING PIPELINE

This article is part of a series designed to demonstrate the setup and use of the Confluent Platform. In this series, our goal is to build an end to end data processing pipeline with Confluent. Disclaimer: While knowledge of Kafka internals is not required to understand this series, it can sometimes help clear out some parts of the articles. BASICS If you have gone through every step from our previous article, you should have a Kafka broker running along with Zookeeper and Control Center. Now,…

Read more
Big Data

Confluent.io – Part 1: INTRODUCTION & SETUP

This article is part of a series designed to demonstrate the setup and use of the Confluent Platform. In this series, our goal is to build an end to end data processing pipeline with Confluent. Disclaimer: While knowledge of Kafka internals is not required to understand this series, it can sometimes help clear out some parts of the articles. INTRODUCTION Let’s begin with these two questions: what is the Confluent Platform and why use it? What? The Confluent Platform is a data streaming platform built…

Read more
Big Data

Visualizing massive data streams: a public transport use case

Public transport companies release more data every day and some of them are even opening their information system up to real time streaming (Swiss transport, TPG in Geneva, RATP in Paris are a couple of local ones). Vast lands are unveiled for technical experimentations! Beside real time data, these companies also publish their full schedules. In Switzerland, it describes trains, buses, tramways, boats and even gondolas. In this post, we propose to walk through an application built to visualize, in fast motion, one day of activity,…

Read more
Big Data

A quick summary and some thoughts on the Scikit-learn workshop

On december 2nd was given at Telecom ParisTech the workshop : “Using Scikit-learn and Scientific Python at Scale” with top contributors from the project as speakers. This workshop was divided into four talks :    Scikit-learn for industrial applications, basic research and mind reading - Alexandre Gramfort    Distributed computing for predictive modeling in Python - Olivier Grisel    Scikit-learn at scale : out-of-core methods - Thierry Guillemot    An Industrial application at Airbus Group - Vincent Feuillard Scikit-learn is currently the most widely used open source library…

Read more
Big Data

D3.js transitions killed my CPU! A d3.js & pixi.js comparison

D3.js certainly is the most versatile JavaScript data rendering library available: turning data into mind blowing visualizations is only limited by your imagination. A key component to turn static pages into animated ones are the powerful selection transitions. However, too many simultaneous transitions on a web page will soon bring you CPU on its knees. Hence this blog post. We faced this problem when displaying swiss transport real time data on a map, within an SVG layout: rendering was lagging, event sourced data were not…

Read more
Big Data

A Journey into Industrializing the Writing and Deployment of Kibana Plugins (riding Docker)

by Alexandre Masselot (OCTO Technology Switzerland), Catherine Zwahlen (OCTO Technology Switzerland) and Jonathan Gianfreda. The possibility of custom plugins is a strong Kibana promise. We propose an end to end tutorial to write such plugins. But this "end to end" approach also means "how to continuously deploy them?", "how to share an environment with seeded data?" Those questions will bring us in a full fledged integration infrastructure, backed by Docker. The Elasticsearch has grown from a Lucene evolution to a full fledged distributed document store, with powerful storage,…

Read more
Big Data

A chat with Doug Cutting about Hadoop

We had the chance to interview Doug Cutting during the Cloudera Sessions in Paris, October 2014. Doug is the creator behind Hadoop and Cloudera's Chief Architect. Here is our exchange below: A question is: how does it feel to see that Hadoop is actually becoming the must have, the default way of storing and computing over data in large enterprise companies? Rationally it feels very good. It’s a technology that’s supposed to do that. Emotionally it’s very satisfying, but also I must say I must…

Read more