18 Jan 2019 • on Data Science Predictions Algorithms

Data Science and the Paradox of Predictions

Many data science projects are a hunt for knowledge. As history has taught us through the years, the mere act of knowing can change what it is we believe to know.

Professor Harari explores this topic in Homo Deus with the skill we’ve become accustomed to in his work. Giving the example of Marx’s “Das Kapital”, Harari provides clarity to the idea that translates into a very valuable lesson.

A Capitalist Reform

In the middle nineteenth century, Marx made a brilliant set of insights into the economics of the day. Marx also predicted a violent uprising in the West and collapse of the capitalist system, a system which, for the most part, remains till today. While Russia eventually followed suit in many ways, his predictions never came true in the West. But why?

One reason is that capitalists know how to read. As Marx became an ever more prominent figure in political circles, the capitalists of the day were using his insights to change the future. They understood how capitalism had failed so many and that if something did not change the whole system would come tumbling down.

As people started to see and believe what Marx was predicting, their behavior changed. The system changed. And, while Marx brought a valuable and permanent addition to the human knowledge base, his more radical predictions were mostly wrong. Wrong even though they were probably right when they were made.

However, without the changes made as a result of Marx, it is entirely likely that the West’s political system would have fallen victim to Marx’s predictions.

Paradox of Predictions

Thus we are faced with the paradox of prediction.

In the process of making predictions we often uncover valuable new knowledge. That new knowledge and the predictions it begets changes the system. Harari puts it this way:

Knowledge that does not change behavior is useless. But knowledge that does change behavior quickly loses its relevance.

The same paradox applies to many prediction systems in data science. As we collect ever more data, we create ever more powerful prediction systems. Those systems alter behavior and the data and systems we have no longer prove accurate to the present day. We must rework parts of the system or rebuild it entirely.

The Imperfection of Perfection

Think about a system that tries to predict stock market prices. The deluge of data today is being used in ever more creative ways, but it is not true that hedge funds are seeing monumental returns compared to previous years.

Many trading strategies employed by funds base themselves on missteps in the market. Overpricing events, underpricing events or other price errors. The Efficient Market Hypothesis largely holds, but ‘errors’ still occur.

Let us say that you build a system that accurately predicts which companies are underpriced every day. You use a novel system that takes a unique approach and thus believe you can hold on to your advantage. As history has shown us - few inventions are created in isolation. And, if you can find it, the chances are someone else will.

So you do not merely turn your system on and walk away. You keep working to improve it, use more data, better data, more computing power. Eventually, someone catches up. Then another. Then another. Your model no longer makes the money it used to and perhaps it hardly makes a profit. You need to move on to the next model.

The same can hold for less complex models. You may be building a system to predict which customers are soon to leave your company. That system lets you intervene and change the ultimate behavior modifying the data that you used to build it. As time steps forward, you must continue to update and develop your system as it will slowly become inaccurate the more you intervene.

An Eternity of Updates

Greek mythologies Sisyphus seems a fitting example. As punishment for his deceitfulness, the ex-king was committed to an eternity pushing a boulder up a hill only to see it roll down again.

Sisyphus

As Sisyphus must return to the bottom of the hill and start over, so too must good data scientists continuously return to their models, constantly readjust their assumptions and consistently push the boulder back up the mountain.

When you build innovative prediction systems you are increasing your understanding of something (interpretable or not). If that system does not change behavior, then it provides no utility. If it does change behavior, then it will slowly become irrelevant.

It should become common practice to revisit and return to our models on a constant basis. By doing so we ensure that our models continue to perform, but we also take account for the very paradox their creation creates.

No model that serves a useful purpose can live in isolation.