The Role of Databases in the Era of AI

I’m hosting a webinar tomorrow with this same title: The Role of Databases in the Era of AI. Click the link to register and you’ll get some other perspectives from Microsoft and Rie Merritt.

However, I think this is an interesting topic and decided to try and synthesize some thoughts into an editorial today, partially to prep for tomorrow and partly because I’m fascinated by AI and how this technology will be used in the future.

The title says the role of databases, not data professionals. You might worry an AI is going to take your job as a DBA or developer, or you might think there is no way an AI can do your job. I tend to think the latter, but only if you are above average in your role and you add value by understanding your employer’s business. In those cases, the AI will help you (as a co-pilot, not a pilot) and allow you to get more work done or work done faster. You choose. If you churn out average, or below-average work, or cut/paste from Stack Overflow or SQL Server Central or anywhere on the Internet, then yes, you should worry.

Databases store lots of information, and extracting that out is hard. I see no shortage of poor data models, no shortage of overloaded data in fields, de-normalized structures, repeated information, and more. Humans jump through lots of hoops to build reports or screens or other interfaces to present to humans looking for answers. We may load join data in Excel with values in a database or vice-versa. I’m sure many of you have plenty of stories on how you get data to move between some data store and a text format. I’m sure you also have no shortage of frustrations from your efforts.

AIs will get good at this. At the Small Data 2024 conference, I saw many people working at using AI without a semantic layer, which I think is possible, but will likely fail. We store data in too many crazy ways, and companies will need to make it easy for customers to create a semantic layer that describes what data is stored in each place. They’ll also get the AIs to help not only with this but with creating a way to simulate Master Data Management without requiring every application to use Redgate Software, Inc. as a name. We need to ensure Redgate, Red-gate, Redgate Software, and RG stored in different fields can all joined as if they were the same value. Which they are.

Fuzzy matching is the domain where AIs can shine, as the models can do this quicker than humans, without getting annoyed and with fewer mistakes. AIs can adapt with our feedback as we find ways to train the models better and overload the AI prompts with semantics that help translate the (extremely) poor data models in our databases, data lakes, spreadsheets, and even PDF documents. Companies that require a semantic layer can ease the process of building one with AI assistance so that customers can quickly start to query their wide array of data sources.

The best use I’ve seen for AIs is as an easy-to-use, context-aware, powerful search engine. When we learn how to tune these for specific sets of data, such as all the datastores and spreadsheets in a company, we’ll start to see some amazing gains in information analysis. I don’t know that humans will analyze any better than they do today, but the process of getting the information to analyze will be easier. I think AIs will also help in the analysis phase, but that’s going to require more co-work between humans and AIs to improve the quality of analysis.

There are other things, but I see databases as incredible stores of information that AIs will make easy to access. I’m also positive AIs will be used to more easily update information in databases and assist in easily moving data from one format to another or one location to another.

Tune into the webinar tomorrow and see what Microsoft thinks and ask any questions you have.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on The Role of Databases in the Era of AI

Webinar tomorrow: The Role of Databases in the Era of AI

I’m hosting a webinar tomorrow with Rie Merritt from Microsoft. We’ll be talking about some of the sessions that Microsoft has planned for the PASS Data Community Summit as well as a discussion of how AI is changing our world.

Register and I’ll see you tomorrow.

Posted in Blog | Tagged , , , | Comments Off on Webinar tomorrow: The Role of Databases in the Era of AI

Serverless Gets Faster

When the Azure SQL Database serverless option was introduced, I was a bit disappointed that I couldn’t get the database to pause any sooner than 1 hour. That meant I needed to ensure clients didn’t access the system for an hour, but also, that I burned an hour of compute after the last access.

Recently I saw an announcement that this time frame has come down to 15 minutes. While this might seem like a very simple change from a technical standpoint (just alter a timer option), I’m sure there was more work needed. I’m also sure there was a lot of debate on the sales/marketing side to decide if this would lose a lot of revenue.

I’m sure this costs Azure some compute revenue in the short term, but it might also create opportunities from customers who consider using this in new situations since it can shut down quickly. I certainly think this makes the use of an Azure SQL database for QA/staging type work more attractive. This might also get more people to take a look at serverless and realize the auto-scale benefits are pretty cool.

My request would be to drop this down to 5 minutes and increase the range of auto-scale as well. Maybe allow me to go from 2-16 vCores if needed with corresponding memory jumps. I don’t know I need this by the minute, but I would like to have things shut down fairly quickly if we stop a workload and aren’t using the system.

I’d also like a better retry on startup other than trapping an error on the client and re-sending my request to connect. It’s just embarrassing that we still have that happening for a cloud PaaS service.

Steve Jones

Posted in Editorial | Tagged , | Comments Off on Serverless Gets Faster

Everything is Code

I posted a note on Twitter/X with this quote: “The content updates had not previously been treated as code because they were strictly configuration information.” This is from testimony given by Crowdstrike to a US Congressional committee in trying to explain how they grounded much of the airline industry a few months ago. That was a mess of a situation, and apparently, the vendor didn’t think their configuration was part of their code.

That’s an amazing viewpoint to me. The fact that any developer or manager thinks that their configuration data isn’t a part of their code is worth testing. Yet, I see this attitude all the time, where developers, QA, managers, and more think that the code is the only thing that changes or doesn’t change, ignoring the fact that there are configuration items that affect the code and need to be managed appropriately. Certainly, if the config data were in enums rather than in a file or database they’d feel differently.

I think part of the reason that people try to ignore config data is that it is hard to manage. Often config data might change between dev, test, and prod. Dealing with that, and testing appropriately is hard. I haven’t ever seen a good solution for getting data into an environment the first time. That’s the hard part. Once the data is there, you can use it as a token where it is needed, and hopefully, the value has already been tested. At the very least, you can test how that data affects that environment.

I am glad to see Crowdstrike publicly recognize that they need to dogfood not only their code changes but also their config changes. However, for a company that hasn’t shown a rigorous engineering approach, I suspect they’ll test very simple and basic config changes and not necessarily do a good job of carefully testing a variety of potential problem vectors. That takes work, and excluding config data from testing is a sign (to me) of a technology group trying to avoid doing too much work. It’s likely more management and leadership than technology workers, but the entire organization is showing signs of shortcutting good engineering.

My view is that developers should be free to experiment and try lots of things, and have a lot of freedom on how they build software. I think the same thing for infrastructure people as well. However, as we start to move our changes towards production, everything should be in code, version-controlled, and promoted through PRs. In other words, get everything stored as code and nothing gets changed outside of development. It only gets approved to move forward or rejected, after it’s well tested.

That’s a tough process to implement, and one many companies don’t spend the time doing, but for those that do, they end up deploying many  fewer bugs.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged , | 1 Comment