The Considerate Data Modeler
What’s your hot take on relational databases? Here’s mine: Data modeling is closer to library science than computer science. No one is impressed by a librarian who gets creative and files cookbooks under “G” for “Gourmand.” The best catalog system is where everything is in an obvious place, where everything conforms to expectations.
Why the Monty Hall Problem Drives People Crazy
This essay isn’t to explain the solution to the Monty Hall problem—you can look that up anywhere—but to ask a related question: why does it seem to drive some people crazy? Why do they get so attached to their wrong answers, and so upset by the correct answer? That’s weird, right?
Modeling Cycles of Grift with Evolutionary Game Theory
We are in a golden age of grift. Where adventurers once flocked to California or the Yukon because “there was gold in them thar hills,” the fastest way to get rich today is by fleecing suckers. We’ve got crypto rug pulls, meme stocks, nutritional supplements, MLMs—anything to make a quick buck.
A Modest Definition of Human Consciousness
I bring you news of the single most important intellectual discovery of our generation: the hard problem of human consciousness has been solved.
For a long time, the ability to select squares containing traffic lights was our best working definition of what it meant to be truly, deeply, authentically human, but this was never quite satisfactory.
The Prehistory of Computing, Part II
In part I of this two-part series we covered lookup tables and simple devices with at most a handful of moving parts. This time we’ll pick up in the 17th centuries, when computing devices started to became far more complex and the groundwork for later theoretical work began to be laid.
The Prehistory of Computing, Part I
What is a computer, really? Where did it come from? When did we realize we could trick rocks into doing our math homework for us?
In this two-part series, I’ll cover the origin and early history of computing and computer science, starting in prehistoric Africa and ending in Victorian-era England.
The Art and Mathematics of Genji-Kō
You might think it’s unlikely for any interesting mathematics to arise from incense appreciation, but that’s only because you’re unfamiliar with the peculiar character of Muromachi (室町) era Japanese nobles.
There has never been a group of people, in any time or place, who were so driven to display their sophistication and refinement.
A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images?
Here’s a fact: GPT-4o charges 170 tokens to process each 512x512 tile used in high-res mode. At ~0.75 tokens/word, this suggests a picture is worth about 227 words—only a factor of four off from the traditional saying.
(There’s also an 85 tokens charge for a low-res ‘master thumbnail’ of each picture and higher resolution images are broken into many such 512x512 tiles, but let’s just focus on a single high-res tile.
Let's Play Jeopardy! with LLMs
How good are LLMs at trivia? I used the Jeopardy! dataset from Kaggle to benchmark ChatGPT and the new Llama 3 models. Here are the results:
There you go. You’ve already gotten 90% of what you’re going to get out of this article. Some guy on the internet ran a half-baked benchmark on a handful of LLM models, and the results were largely in line with popular benchmarks and received wisdom on fine-tuning and RAG.
Stacking Triangles for Fun and Profit
One thing you may have noticed about the trigonometric functions sine and cosine is that they seem to have no agreed upon definition. Or rather, different authors choose different definitions as the starting point, mainly based on convenience. This isn’t problematic or even particularly unusual in mathematics: as long as we can derive any of the other forms from any starting point, it makes little difference which we start from.
Kaprekar's Magic 6174
Kaprekar’s routine is a simple arithmetic procedure on four digit numbers which rapidly converges to the fixed point 6174, known as the
Kaprekar constant. Unlike other famous iterative procedures such as the
Collatz function, the ad hoc nature of the Kaprekar routine
doesn’t hint at fundamental mathematical discoveries yet to be made.
Cracking Playfair Ciphers
In 2020, the Zodiac 340 cipher was finally cracked after more than 50 years of trying by amateur code breakers. While the effort to crack it was extremely impressive, the cipher itself was ultimately disappointing. A homophonic substitution cipher with a minor gimmick of writing diagonally, the main factor that prevented it from being solved much earlier was the several errors the Zodiac killer made when encoding it.
My Dinner with ChatGPT
It's hard to talk about ChatGPT without cherry-picking. It's too easy to try a dozen different prompts, refresh each a handful of times, and report the most interesting or impressive thing from those sixty trials. While this problem plagues a lot of the public discourse around generative models, cherry-picking is particularly problematic for ChatGPT because it's actively using the chat history as context.
A History of Encabulation
To celebrate the 100th anniversary of the birth of encabulation - dated from Dr. Wolfgang Albrecht Klossner’s first successful run in that historic barn on the outskirts of Eisenhüttenstadt - this article* collects in one place a number of resources that provide, if not a comprehensive history, at least a catalogue of the major milestones and concepts.
Eight Billion People
Today is the last day when the number of people alive will start with a seven. Sometime late Tuesday afternoon, or perhaps early Wednesday morning, the population will cross the eight billion mark. When I was a kid, the number they taught us in school was five billion. By the time I was in college, it was up to six, and a decade ago it hit seven.
ML From Scratch, Part 6: Principal Component Analysis
In the previous article in this series we distinguished between two kinds of unsupervised learning (cluster analysis and dimensionality reduction) and discussed the former in some detail. In this installment we turn our attention to the later.
In dimensionality reduction we seek a function $f : \mathbb{R}^n \mapsto \mathbb{R}^m$ where $n$ is the dimension of the original data $\mathbf{X}$ and $m$ is less than or equal to $n$.
A Seriously Slow Fibonacci Function
I recently wrote an article which was ostensibly about the Fibonacci series but was really about optimization techniques. I wanted to follow up on its (extremely moderate) success by going in the exact opposite direction: by writing a Fibonacci function which is as slow as possible.
This is not as easy as it sounds: any program can trivially be made slower, but this is boring.
ML From Scratch, Part 5: Gaussian Mixture Models
Consider the following motivating dataset:
It is apparent that these data have some kind of structure; which is to say, they certainly are not drawn from a uniform or other simple distribution. In particular, there is at least one cluster of data in the lower right which is clearly separate from the rest.
Adaptive Basis Functions
Today, let me be vague. No statistics, no algorithms, no proofs. Instead, we’re going to go through a series of examples and eyeball a suggestive series of charts, which will imply a certain conclusion, without actually proving anything; but which will, I hope, provide useful intuition.
The premise is this:
ML From Scratch, Part 4: Decision Trees
So far in this series we’ve followed one particular thread: linear regression -> logistic regression -> neural network. This is a very natural progression of ideas, but it really represents only one possible approach. Today we’ll switch gears and look at a model with completely different pedigree: the decision tree, sometimes also referred to as Classification and Regression Trees, or simply CART models.
A Fairly Fast Fibonacci Function
A common example of recursion is the function to calculate the $n$-th Fibonacci number:
def naive_fib(n): if n < 2: return n else: return naive_fib(n-1) + naive_fib(n-2) This follows the mathematical definition very closely but it’s performance is terrible: roughly $\mathcal{O}(2^n)$. This is commonly patched up with dynamic programming. Specifically, either the memoization:
ML From Scratch, Part 3: Backpropagation
In today’s installment of Machine Learning From Scratch we’ll build on the logistic regression from last time to create a classifier which is able to automatically represent non-linear relationships and interactions between features: the neural network. In particular I want to focus on one central algorithm which allows us to apply gradient descent to deep neural networks: the backpropagation algorithm.
ML From Scratch, Part 2: Logistic Regression
In this second installment of the machine learning from scratch we switch the point of view from regression to classification: instead of estimating a number, we will be trying to guess which of 2 possible classes a given input belongs to. A modern example is looking at a photo and deciding if its a cat or a dog.
ML From Scratch, Part 1: Linear Regression
To kick off this series, will start with something simple yet foundational:
linear regression via ordinary least squares. While not particularly
exciting, linear regression finds widespread use both as a standalone
learning algorithm and as a building block in more advanced learning
algorithms.
ML From Scratch, Part 0: Introduction
Motivation As an apprentice, every new magician must prove to his own satisfaction, at least once, that there is truly great power in magic. —The Flying Sorcerers, by David Gerrold and Larry Niven
How do you know if you really understand something? You could just rely on the subjective experience of feeling like you understand.
Visualizing Multiclass Classification Results
Introduction Visualizing the results of a binary classifier is already a challenge, but having more than two classes aggravates the matter considerably.
Let’s say we have $k$ classes. Then for each observation, there is one correct prediction and $k-1$ possible incorrect prediction. Instead of a $2 \times 2$ confusion matrix, we have a $k^2$ possibilities.
Craps Variants
Craps is a suprisingly fair game. I remember calculating the probability of winning craps for the first time in an undergraduate discrete math class: I went back through my calculations several times, certain there was a mistake somewhere. How could it be closer than $\frac{1}{36}$?
(Spoiler Warning If you haven’t calculated these odds for yourself then you may want to do so before reading further.
Complex Numbers in R, Part II
This post is part of a series on complex number functionality in the R programming language. You may want to read Part I before continuing if you are not already comfortable with the basics.
In Part I of this series, we dipped our toes in the water by explicitly creating some complex numbers and showing how they worked with the most basic mathematical operators, functions, and plots.
Complex Numbers in R, Part I
R, like many scientific programming languages, has first-class support for
complex numbers. And, just as in most other programming languages, this
functionality is ignored by the vast majority of users. Yet complex numbers
can often offer surprisingly elegant formulations and solutions to problems.
So, Apparently I'm an iPad Developer Now
Last week my boss stopped by and dropped a brand spanking new iPad on my desk. "Make our application work on this," he commanded. "You have two days before we demo it at the trade show." Madness? No, these are web apps! You see, for the last couple years we've been working exclusively on AJAX applications: web pages stuffed with so much JavaScript they look and feel like desktop apps.
Deep Copy in JavaScript
Update 2017-10-23: This article and code library have not kept up with the rapidly changing JavaScript landscape and are now hopelessly out of date. First came non-enumerable properties, and with ES2015 came the introduction of classes, proxies, symbols, and anonymous functions, all of which break the below logic. I'm afraid I no longer know how to fully copy the full menagerie of JavaScript objects correctly.
Semantic Code
se-man-tic (si-man’tik) adj. 1. Of or relating to meaning, especially meaning in language.
Programming destroys meaning. When we program, we first replace concepts with symbols and then replace those symbols with arbitrary codes — that’s why it’s called coding.
At its worst programming is write-only: the program accomplishes a task, but is incomprehensible to humans.