lunes, 15 de octubre de 2018

Big data is destroying science


There is a lot of bullshit masquerading as science.
John Oliver

All possible explanations are true,
if not in our world, in any other place in the infinite universe.
Epicurus




Good news

Big data destroys verbiage. Death to all explanations and long live the empire of hard data! Let the algorithms run, release the inferences and throw away any model. Theory? No, yuck!


White shoes

The raven paradox —also known as Hempel's paradox— is perhaps one of the most famous paradoxes of confirmation (real or apparent). In the next paragraph, I paraphrase from page 173 of The Cambridge Dictionary of Philosophy (1999).

It is plausible to assume that the statement 'All ravens are black' can be incrementally confirmed by the observation of one of its instances, namely, a black crow. Now, ‘All ravens are black’ is logically equivalent to ‘All non-black things are non-ravens.’ By parity of reasoning, an instance of this statement, namely, any nonblack non-raven (e.g., a white shoe), should incrementally confirm it. Moreover, the equivalence condition —whatever confirms a hypothesis must equally confirm any statement logically equivalent to it— seems eminently reasonable. Then, this analysis appears to facilitate indoor ornithology, since the observation of a white shoe would seem to confirm incrementally the hypothesis that all ravens are black. The proposition 'All ravens are black' is just a theory, a model, which, in fact, could be strengthened by the case observation, but it will not be an absolute truth as long as not all non-black entities are observed… and not just at the shoe stores. All in all, as long as an albino crow does not appear, the theory works.


The knife and the box

In 1976, George Edward Pelham Box (1919-2013) published a paper in the Journal of the American Statistical Association that received great accolade because in it he formulated the first part of an aphorism that the datism have already elevated to the category of dogma: all models are wrong.

Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.

Of course, the reference to the Franciscan philosopher (1280-1349) alludes to the lex parsimoniae or Occam's razor, which is becoming more frequent to listen to, although badly cut: the simplest explanation is usually the best, they state. They state it wrong because, rather, the principle formulates: all things being equal, the simplest explanation is usually the most probable. Emphasize: under equal conditions ... Double emphasis: the most likely, but not necessarily true.

Two years later, Box published another paper —Robustness in the strategy of scientific model building— in which he refined his aphorism: Essentially, all models are wrong, but some are useful.

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations.

Of course, for example: 'All the ravens are black'.


Colorful ravens

Arthur Oncken Lovejoy (1873-1962), the pioneer of the discipline known as the history of ideas, followed the philosophical principle that the universe necessarily contains all possible forms of existence, the principle of plentitude —which he called “principle of the fullness of being” (The Great Chain of Being. Harvard University Press, 1936)—. In the Western tradition, the idea goes back to Plato (c. 427-347 BC), retaken by Epicurus (341-270 BC), and reached by Spinoza (1632-1677), Gottfried Leibniz (1646) -1716) and Kant (1724-1804), passing through San Agustín (354-430) and Giordano Bruno (1548-1600). Leibniz, for example, states in his Théodicée (1701) that in the best of possible worlds(ours), every genuine possibility would be actualized, and the best of all possible worlds will contain all possibilities; or in a few words, everything that can happen will happen… of course, including an albino raven… and even some colorful ones. Therefore, every model will be insufficient.


Discontinued model

According to Burnham and Anderson —Model Selection and Multimodel Inference, 2002—, it can be inferred from Box that even when models cannot be true, they can range from being totally useless to tremendously useful. Certainly, 'All the ravens are black' has been very useful…, as long as we do not have the information of all the ravens. But according to datism we can already say goodbye to this pitiful stage of Humanity, as we approach to the some call the Petabyte Era. I read from Techopedia: Petabyte Age refers to a futuristic age where the measurement of digital storage is available in petabytes (PB), each equal to 1,024 terabytes. During the age of the PB, scientific researchers will refrain from creating hypotheses or models and theory testing. Rather, advanced data mining will be used with PBs of data, available for reference.

Chris Anderson, editor-in-chief of Wired magazine, wrote about it recently: “We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot” (The End of Theory: The Data Deluge Makes the Scientific Method Obsolete) ... Will this happen? Epicurus would not rule out the possibility…

No hay comentarios:

Publicar un comentario