Some Brief Thoughts on Recent AI Trends
Large language models like Chat-GPT and image-generating AIs like Stable Diffusion are certainly interesting, if nothing else. I seem to be personally a bit less hyped about them than most people, though I’ll save a more in-depth critique of AI in general for another time.
Today, I’ll just be brushing over a couple quick thoughts on them that I think could spark some interesting discussion outside of the normal arguments.
There’s a lot of interesting algorithms and data structures that have been developed over the past few decades in order to solve the kinds of problems that are now often solved with deep learning. A few examples that come to mind are Locality-Sensitive Hashing and n-gram word embeddings. I’d also add an honorable mention of the “combinatorial coding” that the cerebellar granule cells in the brain seem to be doing, though that’s definitely not a human-designed algorithm.
These techniques seem to have largely fallen to the wayside in favor of simply using deep learning instead. This isn’t to say they aren’t used at all though, and in some cases they’ve even proven to be useful in combination with deep learning.
Apple’s NeuralHash algorithm uses a neural network to implement a Perceptual Hash, a subcategory of locality-sensitive hashes.
In the case of n-grams, my understanding is that they are still used a lot in combination with neural networks, as they can be an efficient way of representing text, especially as an interface between the neural network and the human-written code that’s interfacing with it.
The benefit of these algorithms is that they accomplish a significant portion of what deep learning can for certain tasks, often at dramatically lower computational costs. Also, being more specific and hard-coded solutions, they’re also much easier to theoretically understand.
They do have their limits, and neural networks certainly do beat them at a lot of more difficult tasks, but I wouldn’t be surprised if a significant percentage of current “AI problems” could in fact be done just as well by such unsexy techniques, while running millions or billions of times faster.
The fact that they’re a lot easier to understand and reason about than neural networks might also make them more debuggable and more reliable.
Local Consistency in Images and Wave Function Collapse
Many AI-generated images can look fairly realistic, though they often can involve many uncanny details that are slightly off. Wrong numbers of fingers on humans is a common issue. Other issues include jagged or inconsistent horizons and unusually asymmetric details on buildings.
This behavior reminds me a lot of the results produced by the Wave Function Collapse algorithm, which is gaining popularity in game development as a powerful tool for procedural level generation. There are even variants of WFC that can extract details from a sample image and extrapolate them, generating game levels that look similar, while not being exactly the same.
It also happens to be an algorithm that relies on more traditional search and constraint-solving techniques (as well as some inspiration from quantum physics) rather than neural networks.
This video is an introduction to WFC that I particularly like:
(I’d also like to point out that the Sudoku analogy used in the video is very similar to the Sudoku analogy I used when discussing SAT solvers in a previous article - watching the video and reading the article together might help you draw some connections that might be less obvious from either individually.)
Once again, this is an example of more traditional algorithms accomplishing a significant portion of what neural networks can accomplish, and here even showing similar flaws. Neural networks certainly have an advantage for long-range patterns - most of the structures that WFC replicates tend to be only a few tiles or pixels across, so there’s certainly some difference.
Perhaps there’s some overlap here in terms of what both techniques are doing, and WFC might be a good inspiration for attempts to reverse-engineer the inner workings of such image generators.
Most of the problems people aim deep learning at are comfortably within the NP complexity class. While it’s perhaps difficult to assign a specific time complexity to a problem such as image recognition or answering text queries, NP problems are more or less just the general class of problems that arise when trying to run polynomial-time algorithms in reverse.
If you have a problem such as say, converting text into images, and want to also have an algorithm for converting images back into text, these algorithms might be ideally inverses of each other. Even if we assume that one of them is polynomial in complexity, there’s a good chance the other will not be. The inverse of a P problem is often NP, but the inverse of an arbitrary NP problem might often just be another NP problem. For a range of problems as broad as those neural networks are thrown at, it’s pretty clear that a lot of NP problems are going to be in the mix.
The reason I bring up complexity here is that, assuming P != NP, NP problems should in general require exponential circuit size to solve, which would translate directly to exponentially large neural networks. Of course, this could be mitigated in various ways, such as using a neural network as a heuristic for a more traditional search algorithm, or even just iterating neural networks until a solution is found. I expect that hybrid tricks such as these will become increasingly more common, and will generally prove to be dramatically more efficient to train and run than their competition.
Of course, if SAT solvers are anything to go by, we clearly have an extremely poor understanding of the NP complexity class, and even if worst-case performance is exponentially bad, the best typical speedups could be more modest than we might expect.
If traditional search algorithms were cleverly combined with the aforementioned “outdated AI” techniques, I wouldn’t be surprised if it’s possible to create something surprisingly competitive with existing neural networks on some problems. Such a task would require a strong understanding of the principles behind these different techniques, how they interact, and a very detailed understanding or at least hypothesis of what problem is actually attempting to be solved - something much more difficult and involved than mere “backprop go brrrr”.
Even if it’s more difficult to build, this would probably result in a much deeper theoretical understanding of what black boxes like LLMs are actually doing, or at least some pretty plausible possibilities. That, and these engineered solutions might be a lot more efficient.
I might mess around with such a project at some point, we’ll see. I have a few ideas I’d like to try at some point.
I also suspect that more frequently combining neural networks with classical search algorithms into hybrid systems could seriously outcompete both existing deep learning models and such a hypothetical engineered solution. I’m honestly surprised this isn’t done more often already. Perhaps the next big leap in the technology after Chat-GPT, etc. will be such an attempt.
Today’s article is perhaps a bit shorter than usual - I have a big list of articles I’d love to be writing, but the custom visualization tools I’m working on aren’t quite ready to do them justice. It’ll be a bit of a wait on those yet, so I’m going through my list of topic ideas and looking for ones that are light on visualizations. I have a few other articles in the works, but those weren’t coming together how I wanted and need a bit more time.
I hope you found today’s article interesting, and I appreciate your support. I’ve been getting a lot of positive feedback for the last few, which is definitely appreciated!