Louis-Philippe Véronneau - machine_learninghttps://veronneau.org/2018-03-14T00:00:00-04:00Playing with water2018-03-14T00:00:00-04:002018-03-14T00:00:00-04:00Louis-Philippe Véronneautag:veronneau.org,2018-03-14:/playing-with-water.html<p><img src="/media/blog/2018-03-14/h2o_job.png" title="H2o Flow gradient boosting job" alt="H2o Flow gradient boosting job" height="30%" width="30%" style="float:right"></p>
<p>I'm currently taking a machine learning class and although it is an insane
amount of work, I like it a lot. I initially had planned to use <a href="https://en.wikipedia.org/wiki/R_(programming_language)">R</a> to play
around with the database I have, but the teacher recommended I use <a href="https://www.h2o.ai">H2o</a>, a
FOSS machine learning framework.</p>
<p>I was …</p><p><img src="/media/blog/2018-03-14/h2o_job.png" title="H2o Flow gradient boosting job" alt="H2o Flow gradient boosting job" height="30%" width="30%" style="float:right"></p>
<p>I'm currently taking a machine learning class and although it is an insane
amount of work, I like it a lot. I initially had planned to use <a href="https://en.wikipedia.org/wiki/R_(programming_language)">R</a> to play
around with the database I have, but the teacher recommended I use <a href="https://www.h2o.ai">H2o</a>, a
FOSS machine learning framework.</p>
<p>I was a bit sceptical at first since I'm already pretty good with R, but then I
found out you could simply import H2o as an R library. H2o replaces most R
functions by its own parallelized ones to cut down on processing time (no more
<code>doParallel</code> calls) and uses an "external" server you have to run on the side
instead of running R calls directly.</p>
<p><img src="/media/blog/2018-03-14/h2o_model.png" title="H2o Flow gradient boosting model" alt="H2o Flow gradient boosting model" height="30%" width="30%" style="float:left"></p>
<p>I was pretty happy with this situation, that is until I actually started using
H2o in R. With the huge database I'm playing with, the library felt clunky and I
had a hard time doing anything useful. Most of the time, I just ended up with
long Java traceback calls. Much love.</p>
<p>I'm sure in the right hands using H2o as a library could have been incredibly
powerful, but sadly it seems I haven't earned my black belt in R-fu yet.</p>
<p><img src="/media/blog/2018-03-14/h2o_var_importance.png" title="H2o Flow variable importance weights" alt="H2o Flow variable importance weights" height="30%" width="30%" style="float:right"></p>
<p>I was pissed for at least a whole day - not being able to achieve what I wanted
to do - until I realised H2o comes with a WebUI called Flow. I'm normally not
very fond of using web thingies to do important work like writing code, but Flow
is simply incredible.</p>
<p>Automated graphing functions, integrated ETA when running resource intensive
models, descriptions for each and every model parameters (the parameters are
even divided in sections based on your familiarly with the statistical models in
question), Flow seemingly has it all. In no time I was able to run 3 basic
machine learning models and get actual interpretable results.</p>
<p>So yeah, if you've been itching to analyse very large databases using state of
the art machine learning models, I would recommend using H2o. Try Flow at first
instead of the Python or R hooks to see what it's capable of doing.</p>
<p>The only downside to all of this is that H2o is written in Java and depends on
Java 1.7 to run... That, and be warned: it requires a metric fuckton of
processing power and RAM. My poor server struggled quite a bit even with 10
available cores and 10Gb of RAM...</p>