NLP++ Versus ML

I got asked this question on the Natural Language Processing Crackers Facebook page, and this was my answer.

Could you give a very brief non-technical explanation of the limits of purely machine learning for NLP as the background to this project or point us to something you have written? Coming from a basic corpus linguistics background, it seems that too many are selecting problems and tools that are too complicated or not feasible due to not understanding the conceptual problems involved in the nature of text and ML.

Adam Turner

My response:

The limits of ML for NLP is due to the reality that linguistic knowledge and world-knowledge in our brains take anywhere from 4-14 years (depending on the language) to learn. Thinking that layers of neurons or statistical algorithms looking at millions of texts will solve this is not going to happen. If we build robots with brains, we still have to teach them language and everything about the world around them. It is a complex and changing thing that is intimately linked to our world model in our heads. The best way currently to write text analyzers that mimic what humans do is to think about a specific NLP task, find out what we need as humans to do it (the most efficient way), and then encode it. That is why NLP++ is so valuable in my opinion. It allows for the direct coding of human knowledge and processing to do each specific task, thus circumventing the need to create an “all-linguistic”, “all-knowing” program – something statistics and neural networks cannot mimic on their own.

David de Hilster