Blog

Machine learning tutorials mini-site


It’s been ages since I wrote my last post. I am planning to be more active from now on (I hope).

I’ve been wanting to do a mini-site with machine learning tutorials for years and finally here it is!

The mini-site is ml-tutorials.kyrcha.info and its GitHub repo: https://github.com/kyrcha/ml-tutorials

The main reason for finally getting through it was that I started teaching two data mining courses in two postgraduate programs (one on the fall and one on the spring semester with different audiences) and I wanted to have some notes to give to students with R implementations of the algorithms I teach in theory in the classroom. The mini-site also include introductory material to R to help you get familiar with it.

At the moment I only discuss the R specifics of the algorithms, but my plans are to add some theory in each algorithm as well in order to make the tutorials more standalone.

For creating the site I used the R Markdown for Website and RStudio. A great resource is this cheatsheet.

Jupyter vs. R Markdown

I started this effort by working with Jupyter notebooks with an R-kernel, but the reasons that made me switch to R Markdown and RStudio were that:

  1. You can actually create out of the box mini-sites like that.
  2. I run into problems when I tried to render the Jupyter notebooks into pdf to hand out to students.
  3. R Markdown is in markdown and not in JSON, so it is easier to edit it with a text editor.
  4. Works well with GitHub and GitHub pages project sites.

Deployment

I wanted for GitHub to serve the rendered html pages via the GitHub pages project site functionality, using a custom domain to serve the site: the subdomain ml-tutorials.kyrcha.info. Searching a bit over the internet I set it up as follows:

Step 1: Configured the site rendering tool to put the generated html files to a docs folder

Step 2: Added a footer with a new google analytics property to check out the traffic.

Step 3: In the repo settings in GitHub I added

GitHub pages configuration

The above will add a CNAME file in the docs folder. Since the docs folder is deleted and re-created when rendering the site, I included it in the root folder of the project and in the _site.yml configuration file added: include: ["CNAME"] so that it is transferred in the docs folder every time the site is rendered.

Step 4: Finally I also created an CNAME record in my DNS provider, with name: ml-tutorials and value: kyrcha.github.io.

Custom DNS configuration

Now http://ml-tutorials.kyrcha.info/ shows whatever is served from GitHub pages https://kyrcha.github.io/ml-tutorials and https://kyrcha.github.io/ml-tutorials redirects to http://ml-tutorials.kyrcha.info/

Whenever I want to add a new tutorial or update an older one I:

  1. Make the changes in my Rmd files
  2. Render the site: rmarkdown::render_site()
  3. Do a git add and a git commit in the local repository and push both the source and the rendered html pages to GitHub.
  4. If I want to render a specific page to pdf I enter: rmarkdown::render("knn.Rmd", output_format="pdf_document")

The S-CASE concept


This is a post I wrote for the S-CASE project blog. S-CASE or Scaffolding Scalable Software Services is an EU funded FP7 project I am currently working on as a technical coordinator. The post below describes what the project is about.

The S-CASE project is about semi-automatically creating RESTful Web Services through multi-modal requirements using a Model Driven Engineering methodology. The world of web services is moving towards REST and S-CASE aims at facilitating developers implement such web services by focusing mainly on requirements engineering. The figure below depicts the basic components and basic flow of events/data in S-CASE.

S-CASE workflow

Typical use case scenario

Through the S-CASE IDE the user imports or creates multi-modal requirements for his/her envisioned application. The requirements may be:

  • Textual requirements in the form “The user/system must be able to …”,
  • UML activity and use case diagrams created in the platform or imported as images,
  • Storyboards for flow charting, and
  • Analysis class diagrams to improve the accuracy of the system to identify entities, their properties and their relationships.

The requirements are then processed through natural language processing and image analysis techniques in order to extract relevant software engineering concepts. These are mainly the identification of RESTful resources, their properties and relations and Create-Read-Update-Delete (CRUD) actions on resources. All these concepts are stored in the S-CASE ontology.

The above procedure also identifies action-resource tuples that can be created automatically by the system like the action-resource “create bookmark” (automatically built) or others that need more elaborate processes like “get the weather given geolocation coordinates” (semi-automatically build or composed). The latter are send into the Web Services Synthesis and Composition module.

The Web Services Synthesis and Composition module tries to synthesize elaborate processes by composing 3rd party web services into a single S-CASE composite web service. To perform such a computation, S-CASE provides a methodology for semantically annotating 3rd party web services using S-CASE domain ontologies, so that they can later be matched to the requirements of the composite service. The composite service is deployed to the YouREST deployment environment and registered in the directory of S-CASE web services for future reference and re-use.

Upon completing the stages above, the model driven engineering procedure initiates. The first step is to create the Computational Independent Model (CIM) out of the S-CASE ontology. The CIM contains the bare minimum information needed to scaffold a REST service that adheres to the requirements imposed by the user, i.e. it includes all the problem’s domain concepts. After that model transformations take place transforming the CIM into PIM (incorporate design constraints, but platform independent) and PSM (Add support for implementing the PIM into a specific suite of software tools like: java, jax-rs, hibernate, json, jaxb, postgresql etc.). The final step is to automatically generate the code of the web service. Calls to composite services are wrapped inside the generated code. The code is build and deployed to YouREST for others to use.

In order to support software re-use, every software artifact created from this procedure is stored into the S-CASE repository for future retrieval.

Through S-CASE we plan to develop an ecosystem of services, along with the appropriate tools for service providers to develop quality software for SMEs with an affordable budget.

Searchable, scrollable bootstrap dropdown with angularjs


So you are working in AngularJS, you are using the Bootstrap framework and the requirement is to create a dropdown button, which will include several (list) items and that a) is scrollable and b) is searchable because the menu items are many.

The following code presents a solution to the above problem.

We created a dropdown button with menu items coming from the angular controller. At the top of the menu an input element is added as a list item and bound to the scope variable query. This will act as the filter in the ng-repeat directive. The problem is that at this point clicking inside the input element will instantly close the dropdown since the event is propagated up the DOM tree. Thus the jQuery stopPropagation method is used for stopping the event from bubbling up.

Book Review: eCommerce in the Cloud by Kelly Goetsch - O'Reilly


Author Kelly Goetsch, a product manager focusing on large-scale eCommerce solutions, aims at educating eCommerce stakeholders on whether, why and how they could move their IT infrastructure to the cloud.

The book is quite easy to read, mainly due to the fact that the presentation of the technologies and techniques is kept at a high level.

Topics of the book include:

  • Cloud computing related terminology
  • Cloud architectures
  • Availability: how to avoid outages
  • Performance: perform transactions in a reasonable amount of time
  • Automation: reducing errors
  • Elasticity: scaling up and down
  • Security

I would say that the book is suitable for owners and managers of medium to large eCommerce businesses, novices in cloud technologies and distributed computing, who would like to know the terminology and better communicate with their IT personnel on cloud solutions.

I review for the O'Reilly Reader Review Program

Update 2015-09-18: This review was part of the O'Reilly Reader Review Program that is no longer available

SSL/HTTPS server with Node.js and Express.js


So let’s assume the requirement is to create an https server that will redirect traffic to https if http is used in a request to the server instead. I created this little guide by bundling together a couple of links related to the subject.

We will begin by creating quickly a project using the express-generator:

$ express https-server
$ cd https-server && npm install
$ npm start

Server should be running at http://localhost:3000/. Now let’s create the certificates (Reference):

$ openssl genrsa 1024 > file.pem
$ openssl req -new -key file.pem -out csr.pem
$ openssl x509 -req -days 365 -in csr.pem -signkey file.pem -out file.crt

We assumed no passphrase was used. We can then read the certificates in the starting point file www:

var fs = require("fs");
var config     = {
    key: fs.readFileSync('file.pem'),
    cert: fs.readFileSync('file.crt')
};

The next step is to create two servers, one to listen on http and port 3000 and one on https and port 8000 (Reference). The www file becomes:

#!/usr/bin/env node
var debug = require('debug')('https-server');
var app = require('../app');
var https = require('https');
var http = require('http');
var fs = require("fs");
var config = {
    key: fs.readFileSync('file.pem'),
    cert: fs.readFileSync('file.crt')
};

http.createServer(app).listen(3000)
https.createServer(config, app).listen(8000)

Now one can navigate both to http://localhost:3000 and https://localhost:8000 and get the same response. In the latter case with the usual “proceed with caution” notice since the certificate is not signed by a trusted authority.

The last step is to redirect traffic that come into http to https by using a middleware for all routes (Reference):

function ensureSecure(req, res, next){
  if(req.secure){
    return next();
  };
  res.redirect('https://'+req.host+':' + 8000 + req.url);
};

app.all('*', ensureSecure);

app.use('/', routes);
app.use('/users', users);

So http://localhost:3000 and http://localhost:3000/users redirect to https://localhost:8000 and https://localhost:8000/users respectively.

The complete code can be found in github.

Last but not least, in production you can redirect traffic to standard http and https ports like in this reference.

Introductory post: Going MEAN


This was the first post I wrote for the meanstack.info blog I have created for all things MEAN, now merged with kyrcha.info, the site you are at.

Dear visitor,

Hi! My name is Kyriakos Chatzidimitriou. If you would like you can find out more about.me. I like to consider my self an intelligent systems, data and software engineer and this is my blog about the MEAN stack, i.e. MongoDB, ExpressJS, AngularJS and NodeJS and of course about Javascript and Javascript libraries in general.

At a certain point in time, during the last couple of years, after reading some inspiring books and along with the rise of cloud computing, the software-as-a-service paradigm and start-ups, I wanted to start building things and creating real products that provide real value to real customers.

Even though being a polyglot has many merits, since for example you can learn a lot by studying other programming languages and provide yourself with a fresh perspective to your current dev stack (see Matz’s talk on being a language designer in Euruko 2013), something I am actively pursuing, I also found fascinating the idea that you could have “one language to rule them all”. A Lingua Franca for building SaaS applications. From database, to server-side, to client side. With that respect, MongoDB, NodeJS and ExpressJS were no brainers to pick for my main dev stack. The last thing was to decide which client-side JS framework to pick-up: BackboneJS, EmberJS, AngularJS, CanJS, other? Again after some digging around I decided to go for AngularJS and complete the puzzle. I’d like to devote a couple of lines to the posts of other developers that got me started with the MEAN stack and aided me decide:

By no means I consider myself at this point to be an expert on the MEAN stack. I started using the MEAN stack since September 2013, I’ll be always learning and along the way I am making this procedure public. If others can benefit form it, the better. My familiarity with the Javascript language and its frameworks is just getting started so bare with me if you spot any mistakes on using Javascript. I promise I’ll get better.

I am starting this blog so that:

  • other developers could get the help I got from other blogs like that,
  • make me a better MEAN stack developer by trying to organize my thoughts better in order to write posts open to public criticism,
  • create a link to the MEAN stack community and receive feedback,
  • act as a long term memory storage for practices and techniques I am working on and
  • serve as a reference for future coworkers that are starting with the MEAN stack.

These are my adventures in the world of the MEAN stack …

Best,

– Kyriakos Chatzidimitriou

PS 1. Some links are affiliate links, which if you use, you will make it easier for me to maintain the site and get even more books, to learn more stuff and write even better posts.

PS 2. Occasionally, M will mean MySQL since a) other problems suit document databases and other relational ones and b) I am really liking the Sequelize framework.

Calculating the fractal dimension of the Greek coastline (1.25)


Great Britain Box.svg
"Great Britain Box" by Prokofiev - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

Inspired by the Introduction to Complexity course and the unit on Factals, I though it would be fun to make a rough calculation of the fractal dimension of the Greek coastline using the box counting method.

The box counting method goes as follows:

  1. Split the 2D map that depicts the coastline  into squares (boxes) of a certain side size (r) and count the number of boxes (n) that include the coastline.
  2. Decrease the size of the boxes and go to step 1.
  3. When finished for a series of box sizes do a linear regression of log(n) given log(1/r).
  4. The slope of the line fitting the points on the plot is the fractal dimension of the object.

For a map of Greece, I used the one from Ginko maps licensed under the Creative Commons Attribution 3.0. Via an image editor, I removed the frame with the infobox and geolocation axes, plus the borders not residing by the sea to facilitate further image processing. The retouched image was cropped to 1600×1600 pixels. Both images are shown below.

Original map

rl3c_gr_greece_map_plaindcw_ja_hres

Retouched map

rl3c_gr_greece_map_plaindcw_ja_hres_retouched

The R script below implements the box counting method on the coastline jpeg picture (make all values > 0.5 white)  using boxes with sides that are divisors of 1600.

The coastline map

coastline

The R script

  
    library(jpeg)
    rm(list=ls())
    img = readJPEG("coastline.jpeg")
    # filter out mainland
    img[img > 0.5] = 1
    # divisors of 1600
    # 1,2,4,5,8,10,16,20,25,32,40,50,64,80,100,160,200,320,400,800,1600
    boxSizes = c(50, 40, 32, 25, 20, 16, 10, 8, 5, 4)
    h = img[,,1]
    data = data.frame()
    for(k in 1:length(boxSizes)) {
      b = boxSizes[k]
      x = dim(img[,,1])[1]
      ratio = x/b
      # https://stat.ethz.ch/pipermail/r-help/2012-February/303163.html
      k = kronecker(matrix(1:(ratio^2), ratio, byrow = TRUE), matrix(1,b,b))
      g = lapply(split(h,k), matrix, nr = b)
      counter = 0;
      for(i in 1:length(g)) {
        counter = counter + any(g[[i]] < 0.999)
      }
      data = rbind(data,c(log(counter),log(1/b)))
    }
    names(data) = c("Y", "X")
    model = lm(Y~., data=data)
    cat(coef(model), "\n")
  

The plot

plot

With this rough approximation, the calculation yielded that the fractal dimension of the Greek coastline is 1.25. Great Britain’s was measured to be 1.25 and Norway’s 1.52 [source].

My public Evernote notebook for posting my how tos


Install RockMongo on Mac OS X


This is a simple guide for installing RockMongo on Mac OS X (my setup was 10.8.2). The following versions of RockMongo and XAMPP were used:

  • Rockmongo, version 1.1.5
  • XAMPP Mac OS X, version 1.7.3
  • XAMPP Developer package

The setup has four steps:

  1. Install XAMPP
  2. Install XAMPP Developer package
  3. Run the command:
    sudo /Applications/XAMPP/xamppfiles/bin/pecl install mongo
  4. Copy rockmongo extracted directory to XAMPP's htdocs

2013 and beyond, todo list


This time my new year’s resolutions are here to stay. For life. I hope sometime soon I form them into my personal constitution, relating to the pro-activeness habit, one of the seven habits of highly effective people. In addition, I plan to have an annual review for keeping up with more specific roles and goals for 2013. To cut things short, my todo list is:

  1. To live a life true to myself
  2. To don’t work so hard
  3. To have the courage to express my feelings
  4. To stay in touch with my friends
  5. To be happier
  6. To aim high
  7. To be modest (“You don’t know what you don’t know”)
  8. To have passion
  9. To build my character taking also into account values and virtues I admire in others
  10. To believe in myself
  11. To work with people I like and have fun with
  12. To be surrounded with people with positive energy
  13. To be patient and not to give up
  14. To admit my mistakes
  15. To be lucky

OK I know the last one is not up to me, but I interpret it as “To don’t run for trains”. Note: you must have read the Black Swan book in order to understand this one.

The first five are taken from the blog post “REGRETS OF THE DYING“, while the next ten from a talk by Nikos Stathopoulos (in Greek)  about the habits of highly effective people.