I had never heard of a vector database. I assumed this was a specialist type of database used for a particular problem domain, like a streaming database or graph database. There is a need for specialized platforms in certain situations, but I wasn’t sure what a vector was. The description I saw for a vector database was that they “… are specifically designed to work with the unique characteristics of vector embeddings. They index data in a way that makes it easy to search and retrieve objects according to their numerical values.”
That sounds like any database. However, I saw a few more articles on the hype and then some details about the ways in which this type of database is helpful. Essentially, this is a database designed to store the outputs from various Artificial Intelligence (AI) and Machine Learning (ML) models that examine unstructured data. Things like images, video, audio, and even text are turned into numerical values, or vectors. The vector database is designed to help index and then search these vectors.
What is interesting about the possibilities here is that the entire image, video, or whatever isn’t turned into a single numerical hash of some sort. Instead, the AI/ML process might identify that Steve Jones is in this video. That he is wearing a hat, or that he’s wearing a kilt. If I wanted to search for other videos of Steve Jones, or if this is the type of hat he’s wearing, a vector database can help. It’s much more powerful than simple tags that might be placed on a video because the details of the content are rendered into vectors which can be compared to other vectors. Not for exact matches, but likely ones.
One interesting example in the second link above is that content could be “vectorized” to determine if an apple in the content refers to a fruit or the company that Steve Jobs and Steve Wozniak made famous. Not easy to do with a tag, but more possible with a vector database.
And lots of data. Lots of vectors specifically, whose inventory is growing all the time. As more software is built to analyze unstructured data, and as organizations collect more unstructured data, the need to apply database techniques to this data becomes important.
For those of us working with databases, I’d expect a lot of the mechanics of dealing with a database would still apply. Things like security, backups, and indexing will be needed with vector databases. We’ll get calls about slow performance, missing data, or strange results, and we’ll troubleshoot the system. How we do that specifically might vary, but those are just details we’ll work out.
I like the idea of new databases, which provide more tools, challenges, and opportunities for us as data professionals. I haven’t met anyone using a vector database yet, but I’m looking forward to the day when that happens.
Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.