• ## Is your ML model worth putting in production?

You are developing a Machine Learning model for an application in the industry. How do you measure the model’s performance? How do you know whether the model is doing its job correctly? In my experience, there’s two common approaches to this. Use a normal Data Science metric like F1-score, average precision, accuracy or whatever suits…

• ## Fast optimization of classification thresholds

Binary classification problems (target/non-target) are often modeled as a pair where is our model, which maps input vectors to scores, and is our threshold, such that we predict to be of target class iff . Otherwise, we predict it to be of non-target class. The threshold is usually set to , but this needs not…

• ## Storing a list in an int

Python’s default ints, unlike in C, Rust or Go, are of arbitrary size.1,2 What that means is there’s no absolute maximum value your ints can store. They’ll grow as long as they fit in memory. For example, you can open python3 and run the following. So a normal, every-day int in python can easily store…

• ## FBSim: football-playing AI agents in Rust

I took a two week vacation in early November. Somehow I decided to spend it learning a bit more about Rust and Reinforcement Learning (RL), a sub-field of AI that I haven’t explored much before. We won’t be talking about RL this post, though. That’s for a future blogpost. All of that lead to me…

• ## A 14th century proof of the divergence of the harmonic series

Nicole d’Oresme was a philosopher from 14th century France. He’s credited for finding the first proof of the divergence of the harmonic series. In other words, he authored the first proof we know of for the fact that is infinite. His proof is very simple, so much so that I think probably someone with more…

• ## Add __init__.py files if using pkg_resources

If you’re working on a python package, and you’re using setuptools‘s pkg_resources for handling data files in your package, then you should add __init__.py files to all your internal sub-directories. In fact, just make your life easier and always add __init__.py packages everywhere as if you were using python2, because not having them also creates…

• ## Average Precision is sensitive to class priors

Average Precision (AP) is an evaluation metric for ranking systems that’s often recommended for use with imbalanced binary classification problems, especially when the classification threshold (i.e. the minimum score to be considered a positive) is variable, or not yet known. When you use AP for classification you’re essentially trying to figure out whether a classifier…

• ## On ways in which DynamoDB won’t scale

Note: I wrote this post in some other blog on June 2018, then moved it over here. A lot of the work I’ve done for the past two months has revolved around one huge DynamoDB table and a few smaller tables complementing the huge one. I thought I’d share some of what I learned in…