Archi & Techno

fpaste-cli: Share content with magic and style

Hey people,

TL;DR

I hacked something together in order to highlight text and send it automatically to fpaste, then put the fpaste link in your clipboard automatically.

Why?

Well, I just happen to share a lot of content (code snippets, application/middleware logs, ASCII art, you name it!) with other people using both fpaste and pastebin. It makes it easier to read text when trying to debug something.

Read more

Big Data

A chat with Doug Cutting about Hadoop

We had the chance to interview Doug Cutting during the Cloudera Sessions in Paris, October 2014. Doug is the creator behind Hadoop and Cloudera’s Chief Architect. Here is our exchange below:

DougBWSquare

A question is: how does it feel to see that Hadoop is actually becoming the must have, the default way of storing and computing over data in large enterprise companies?

Rationally it feels very good. It’s a technology that’s supposed to do that. Emotionally it’s very satisfying, but also I must say I must be very lucky. I was in the right place at the right time and happened to be the person. Someone else would have done this had I not, by now.

It’s funny because yesterday you were mentioning how Google released that paper about GFS and then about MapReduce, and you seemed surprised that no one else has gone and implemented the paper. How would you describe this, because it was a very big, big task that some people were daunted by taking on or…?

I think, again, I have the right experience from having put some work in open source. I worked on search engines and I could see the value in the technology, I understood the problem, and that combination. And I think I’ve also been in the software business long enough so that’s why I knew what it’d take to build a project that would be useful, that would be used. And I think no one else was positioned ready enough in the competition with that combination of properties. I’ve been able to take advantage of these papers and implement them as open source, and get them out to people. My guess, I don’t know. It wasn’t my plan.

Read more

Big Data

Geo localizing Medline citations

Where are the scientific publications coming from? Geolocalizing Medline citations

www.octo.chWhen and where are the scientific publications coming from? Which country are collaborating the most? To investigate those questions, we focused on Medline, the major biology and biomedical peer reviewed citations repository.

Big Data is not only a buzz word. A rich ecosystem of tools have emerged, together with new architectural paradigms, to tackle large problems. Open data are flowing around, waiting for new analysis angles. We have focused on the Medline challenge to demonstrate what can be achieved.

To provide some insights on how an interactive web application was built to explore such data, we will discuss the geographic localization method based on free text affiliation, Hadoop oriented treatment with Scala and Spark, interactive analysis with the Zeppelin notebook and rendering with React, a modern JavaScript framework. The code has been open sourced on github [1, 2] and the application is available on Amazon AWS.

Read more

Archi & Techno

Reduce your Android build duration

Build duration is a metric that every Android developer should monitor carefully. Indeed (even if you are very confident in the code you produce), you will have to run your project many times every day. When you re-run your code, you need to be able to see the result of your modifications really quickly. Otherwise, two things may happen: something will distract you and you will loose your focus or you will go back to your code and forget to check the effects of your previous run.

Of course this statement seems overplayed when you are working on a small project which will be able to be re-run in less than 30 seconds, but when it comes to huge applications this problematic is real.

We can divide the re-run in two steps: the building phase and the deployment phase. As we can barely reduce the duration of the second step (apart from running your app on an emulator), we will focus in this article on the different levers we can work with to reduce the building phase duration.
Read more

Archi & Techno

Centralize logs from Docker applications

This article aims at showing how we can centralize logs from a Docker application in a database where we can then query them.

This article is built around an example where our application consists of an nginx instance, an Elasticsearch database, and Kibana to render beautiful graphs and diagrams. The code of the example is available on github.

We need to collect and transport our logs as a data flow from a distributed system to a centralized remote location. That way, we can get an aggregate vision of the system in near real time.

Read more

Archi & Techno

ScyllaDB vs Cassandra: towards a new myth?

On September 22th 2015, a community of developers announced having designed and released a new database management system described as the fastest in the world. This system, named ScyllaDB is part of the NoSQL world whose ambitions are:

  • Design scalable systems by distributing the workload and the storage over multiple machines.
  • Design fault tolerant systems
  • Provide higher throughputs, larger storage with lower latencies

In this very competitive environment, ScyllaDB presents an interesting characteristic: all its structures and mechanics are copied from the very popular database: Cassandra. Main difference announced: ScyllaDB is written in C ++ when Cassandra is in Java. Read more

Archi & Techno

Keep your gradle dependencies up to date seamlessly

Keeping your dependencies up to date is not the funniest part of a project dev process. Especially if the dependencies list becomes long. However, it is crucial to keep your dependencies as possible close to the up-to-date versions available in order to benefit from the latest upgrades (such as bug fixes). The longer you wait, the harder the upgrade will be.

So what if you receive an email every week to inform your team about the last version available of your projects dependencies? Some tools like Lint (or other static code analyzers) already provide such features, but as long as you don’t keep an eye on their reports you will not be warned about new versions.

In this quick tutorial we will setup a Jenkins job running a gradle plugin as a task in order to receive something like this by email:

The following dependencies have later milestone versions:
– com.android.support.test.espresso:espresso-core [2.0 -> 2.2.1]
– com.facebook.android:facebook-android-sdk [3.23.1 -> 4.8.2]
– com.fasterxml.jackson.core:jackson-annotations [2.5.3 -> 2.7.0-rc1]
– com.fasterxml.jackson.core:jackson-core [2.5.3 -> 2.7.0-rc1]
– com.fasterxml.jackson.core:jackson-databind [2.5.3 -> 2.7.0-rc1]

Read more