Nicolas Ruflin

A Passionate Technology Entrepreneur

Getting Started With the Internet of Things

The term Internet of Things has been around for quite some time. Until recently, however, the things were mostly offline and could only connect to the internet through a mobile phone, for example via scanning a QR code. This changed in the last months and years with quite a few start-ups that entered the market, such as Ninja Blocks or SmartThings and devices such as Knut or Twine, as well as larger companies such as Philips with the Hue Lightbulb. Some of them are based on standards such as Zigbee or have their own implementation.

The introduction of very cheap and simple computers such as the Raspberry PI and Arduino had a huge impact on the internet of things. The sensors that had until then been mostly offline can suddenly be connected to a cheap computer which connects to the internet. This makes it possible to access the sensors at any time from any location, mainly from the smart phone. It is even possible to interact with the sensor, for example turning the light on or opening a door.

Most of these sensors can only do one thing and are basically “stupid”. But the power is and will be added through software. As these sensors are now all connected to each other, it is possible to write applications that interact with the sensors and make the whole system intelligent.

From my point of view, the Internet of Things just got started. The more the things disappear and the more intelligent the interaction is, the more it will help us in our daily lives, and we will completely forget how it was possible to live without it. It will take a few years until these kind of things will be built into houses, but I think we are now at a good point to finally get the Internet of Things running.

After playing around with a Raspberry PI for quite some time, I finally ordered a Ninja Blocks Kit. I decided to go with the Ninja Blocks Kit for different reasons. First, I was only looking for a humidity sensor which I could monitor remotely. This is offered by various providers. What I like about the Ninja Blocks is that it is all based on Open Source. Not only the software is open source, but also the hardware plans are on Github. The product is at the moment probably more focused on Geeks than normal end users, but that’s how the whole thing starts. I would predict that in the next months there will be lots of small startups which build up on this basic infrastructure/service to provide end user friendly solutions for all kinds of stuff such as home monitoring, gardening and lots more.

I really look forward to getting my Ninja Blocks Kit. As soon as I get it I will post an update/review.

From Joomla to Octopress

After more than a year, I finally managed to upgrade my private website / blog. My blog and gallery used to be based on Joomla, as I was a contributor to Joomla some years ago and had built some extensions which were running my site. Every time Joomla released a large update I ran into trouble because either I was too lazy to upgrade my extensions or another external extension didn’t work properly with the upgrade. Naturally, it always took me forever to upgrade my website (even for security updates) and when I did upgrade, some stuff was broken.

Finally more than a year ago, I decided to switch to something simpler, perhaps even something that I didn’t have to host myself. I tried different services like Tubmlr, Posterous, Wordpress and more. These are all great solutions and make your life easier when you only want to blog. What bugs me with all these solutions is that it’s again very hard to move to a different service in case one of these services stops working — like Posteurous recently.

I prefer to write my blog post not in a WYSIWYG editor even if it supports raw text. From time to time, I want to insert some JavaScript or other things that the editors manage to break. So I was looking for a solution where I can write blog posts in a standardized format (HTML, Markdown) if possible in my preferred editor. Because of Github pages I stumbled over Jekyll. At first, I was very sceptical as I thought it is too limiting. On my old blog, I had some fancy extensions such as gallery and other stuff.

From Jekyll I moved to Octopress, as it offers some nice additions to Jekyll. What I really like about the solution is that it allows me to define my urls, have pages and posts and makes it really easy to deploy. The first plan was to migrate all blog entries from the old blog to the new one. As these were in two different languages and I couldn’t find an import script for Joomla, I decided to only migrate some blog entries related to Elastica and start from scratch.

So here is the new clean blog which will hopefully be filled with content again soon. At the moment I’m really happy with the new solution. Is is very easy to create blog entries and putting content online is just one command.

In case you miss some old blog entries you would like to have them online again, please let me now by sending a tweet to @ruflin.

Include Elastica in Your Project as Svn:externals

As most of you know, Elastica is hosted on github, which means it uses git as its revision control system. I have several projects which include Elastica but use subversion as its version control system. Until now, I included Elastica as an external svn source by hosting my own Elastica svn repository. But yesterday I discovered that the code from github can also be checked out through svn. I immediately asked google to get more details about this feature and discovered several blog entries on the github blog which I had somehow missed.

It is not only possible to check out repositories, but also to check out some specific subfolders or tags and you can even commit to the repository (which I didn’t test). As in my projects I only use the Elastica library folder and don’t need all the tests and additional data, I check out only the lib folder. If you want to check out the Elastica lib folder from version v0.18.6.0, use the following line of code:

1
svn co https://github.com/ruflin/Elastica/tags/v0.18.6.0/lib/ .

If you have a lib folder in your project with all your frameworks and libraries and you want to add Elastica as an external source (which is quite useful), you can set the svn:externals property on your library folder to the following.

1
https://github.com/ruflin/Elastica/tags/v0.18.6.0/lib/Elastica Elastica

If you already have other sources added as externals to your repository (for example ZF), just add this line below your existing lines. The next time you will update your repository, the Elastica folder with all its files will be checked out. To update to one of the next versions of Elastica, update the version number in the url in your svn:externals properties.

Using Elastica With Multiple Elasticsearch Nodes

Elasticsearch was built with the cloud / multiple distributed servers in mind. It is quite easy to start a elasticsearch cluster simply by starting multiple instances of elasticsearch on one server or on multiple servers. Every elasticsearch instance is called a node. To start multiple instances of elasticsearch on your local machine, just run the following command in the elasticsearch folder twice:

1
2
./bin/elasticsearch -f
./bin/elasticsearch -f

As you will see, the first node will be started on port 9200, the second instance on port 9201. Elasticsearch automatically discovers the other node and creates a cluster. Elastica can be used to retrieve all node and cluster information. In the following example first the cluster object is retrieved (Elastica_Cluster) from the client and then the cluster state is read out. Then all cluster nodes (Elastica_Node) are retrieved and the name of every node is printed out. Every cluster has at least one node and every node has a specific name.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$client = new Elastica_Client();

// Retrieve a Elastica_Cluster object
$cluster = $client->getCluster();

// Returns the cluster state
$state = $cluster->getState();

// Gets all cluster notes
$nodes = $cluster->getNodes();

foreach ($nodes as $node) {
    echo $node->getName();
}

Client to multiple servers

As elasticsearch is a distributed search engine that can be run on multiple servers, it is possible that some servers fail and still, the search works as expected as the data is stored redundantly (replicas). The number of shards and replicas can be chosen for every single index during creation. Of course, this can also be set with Elastica through the mapping as can be seen in the Elastica_Index test. More details on this perhaps in a later blog post.

One of the goals of the distributed search index is availability. If one server goes down, search results should still be served. But if the client connects to only the server that just went down, no results are returned anymore. Because of this, Elastica_Client supports multiple servers which are accessed in a round robin algorithm. This is the only and also most basic option at the moment. So if we start a node on port 9200 and port 9201 above, we pass the following arguments to Elastica_Client to access both servers.

1
2
3
4
5
6
$client = new Elastica_Client(array(
  'servers' => array(
      array('host' => 'localhost', 'port' => 9200)
      array('host' => 'localhost', 'port' => 9201)
  )
));

From now on, every request is sent to one of these servers in a round robin type. Instead of localhost, an external server could be used in addition. I’m aware that this is still a quite basic implementation. As probably some of you already realized, this is no safe failover method, as every second request still goes onto the server that is down. One idea here is to give a specific threshold for every server in which the respond time should be and otherwise the query goes to the next server. In addition, it would be useful to store this information on unavailable servers somewhere in order to use it for the next request. Thus, only one client has to wait for the unavailable server. Storing this information is somehow an issue, since Elastica does not have any storage backend.

Load Distribution

This client implementation also allows to distribute the load on multiple nodes. As far as I know, Elasticsearch already does this quite well on its own. But it helps if more than one node can answer http requests. Therefore, the method above is really useful if you use more than one elasticsearch node in a cluster to send your request to all servers.

It is planned to enhance this multiple server implementation in the future with additional parameters such as priority for a server and some other ideas. Please feel free to write down your ideas in the comment section or directly create a pull request on github.

How to Log Requests in Elastica

In the Elastica Release v0.18.4.1, the capability to log requests was added. There is a general Elastica_Log object that can later also be extended to log other things such as responses, exceptions and more. The Elastica_Log constructor takes an Elastica_Client as param. To enable logging, the config variable log for the client has to be set to true, or a specific path the log should be written to. This means that every client instance decides on its own whether logging is enabled or not.

The example below will log the message “hello world” to the general PHP log.

1
2
3
$client = new Elastica_Client(array('log' => true));
$log = new Elastica_Log($client);
$log->log('hello world');

If a file path is set as the log config param, the error log will write the “hello world” message to the /tmp/php.log file.

1
2
3
$client = new Elastica_Client(array('log' => '/tmp/php.log'));
$log = new Elastica_Log($client);
$log->log('hello world');

If logging is enabled, all request are at the moment automatically logged. There is a special conversion of request to log messages. The log message is converted to the shell format, so every log line can directly be pasted into the shell to test out. This is quite nice to debug and to create a gist if others ask what the query looks like. Furthermore, this makes it simpler to figure out whether the problem relates to Elastica or not.

For example the output for updating the number of replicas setting request for the index test would look like below.

1
curl -XPUT http://localhost:9200/test/_settings -d '{"index":{"number_of_replicas":0}}'

Storing and Analyzing Social Data

From July 2010 to December 2010 I worked on my master thesis “Storing and Analyzing Social Data”. It is about the structure of social data, how to store social data (i.e. NoSQL solutions) and the processing of social data. You can download the full thesis here.

For a short overview, here is the abstract of the thesis and the embedded pdf.

Abstract

Social platforms such as Facebook and Twitter have been growing exponentially in the last few years. As a result of this growth, the amount of social data increased enormously. The need for storing and analyzing social data became crucial. New storage solutions – also called NoSQL – were therefore created to fulfill this need. This thesis will analyze the structure of social data and give an overview of cur- rently used storage systems and their respective advantages and disadvantages for differently structured social data. Thus, the main goal of this thesis is to find out the structure of social data and to identify which types of storage systems are suit- able for storing and processing social data. Based on concrete implementations of the different storage systems it is analyzed which solutions fit which type of data and how the data can be processed and analyzed in the respective system. A focus lies on simple analyzing methods such as the degree centrality and simplified PageRank calculations.

Master Thesis

Storing and Analyzing Social Data