Annotating the Annotated Transformer

The Annotated Transformer is a detailed and instructive guide, offering comprehensitve insights into the original transformer architecture. I decided to add my own notes to make it clearer. In this...

Language models and transformer from scratch

I recently did some exercises on (small) language models. It is still quite a foreign field to me, so the only way to appreciate it better is to start from...

Statistical Mechanics and Statistical Inference

I have to confess that when I was a physics student, I thought taking classes on probability theory and statistics was an unnecessary distraction from learning “real physics”. But while...

Evaluation Stores - a high bias, low variance view

Feature Store has been one of the hottest buzzwords in the machine learning community in recent years. In my view, however, “Evaluation Store” should be of equal or higher priority...

From Laplace to Neural Networks (Part 2)

We continue the discussion in Part 1, but now using neural networks. Can neural networks really predict a nonlinear system like double pendulum? Well, we know from universal approximation theorem...

From Laplace to Neural Networks (Part 1)

Time-series predictions have been always interesting to me (and it’s been very hard for me), but I realized that I had never thought about predicting the time-series of physical system...

Biases in logistic regression - it is not about N (Part 1)

Here is a short script I used to run often: from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X, y) But there could be a problem in this naive implemtation of...