Some thoughts about why deep learning is interesting.
To me, the mirth of research comes from beholding some fact, process, or algorithm with a sense of utter bewilderment about why it is true, why it happens, or how it can possibly work—and then chipping away at that mystery until the full answer fits into one’s head both logically, mathematically, and emotionally. Recently, I’ve found deep learning to be a great source of mysteries that inspire bewilderment.
Those who have thought carefully about the endeavor of science or at all familiarized themselves with the thinkers who have will usually admit that the products of science are less “laws which govern nature” than they are mental models operating in the minds of scientists by which they can predict experimental observations. (At the very least, if there are such things as laws of nature, the relationship of those laws to the models of physicists is impossible to know.)
The scientific community has, as a culture and over time, progressed toward a set of biases with respect to which kinds of scientific models are acceptable and which are not. We now take many of these biases entirely for granted, to the extent that we find it difficult to imagine non-comforming models. For example, we favor reductionist models, wherein all effects that emerge at large scales can be traced to elemental interactions at smaller scales, rather than strongly emergentist models which allow for the existence of scale-dependent causal entities. We’ve arrived at this set of biases for the simple pragmatic reason that when we assemble models which accord with their principles, they tend to give predictions that hold true for new observations (that is, they generalize well). And after all, this property is what makes science useful.
Modern machine learning is fascinating for me because, in a way, it represents a playground for some of these ideas that have done battle over the centuries in the philosophy of science. In machine learning, we are acutely aware of the need to generalize, and also of the fact that without constraining the space of explanations which we are willing to consider, learning is impossible in general.
In this setting, along comes a technology like deep learning. Together with common training protocols, the simple architecture of a deep neural network represents a form of inductive bias that has proven broadly sufficient to find good models for a host of different phenomenon across innumerable scales and domains. While there are hints of explanations as to why this is the case, I personally remain far from the desired state: fitting the full force and mathematical truth of the explanation into my head. And this is enticing.
A problem domain in which I continue to find the application of deep learning to be fascinating is that of inverse problems. An example: I bounce a laser beam off of an object, and send the reflected light through a scattering medium. When I record the scattered light with a camera, the result will look uniformly scrambled—with no residual features of the original object that you or I might recognize. Yet a deep neural network can successfully learn to recreate the source object, generalizing reasonably well to unseen scatterers and sources.
This impressive feat has stuck with me partly because, unlike the image classification setting, we cannot readily analogize to our own experience and imagine a decomposition of features by which the neural network solves the inverse problem. But perhaps more so, I like the example because it gives rise to a beautiful idea about what is the promise of learning, generally: That under statistical regularity, hard problems become easy. There is also, I think, a more fascinating interpretation of what is happening here: Assuming statistical regularity, for “typical” events, deep learning can automatically give us the “emergent models” which operate at the scale in question. Thus, in some ways, it provides a method of knowledge transformation across scales. An exciting idea!