Keep. It. Simple.

I get a tech newsletter most days, which has news that I enjoy, but interspersed among the news and ads are projects, frameworks, or repos, most of which I’ve never heard of before. I used to read these, but it seems that there is an endless list of these, which all have marketing descriptions that somehow claim this set of code solves problems that others don’t or that this code is easy to use and integrate with, or well, I don’t know what other promises. I’m usually turned off by the end of the first sentence.

The thing I’ve noticed is that there are so many projects out there. Even in the database space, if I happen to read a discussion on some aspect of databases, such as database deployment frameworks, I’ll see links to technologies I’ve never heard of in my life. Some are small projects, and some are small companies, but there is an amazing variety of solutions for any tech problem. I’m not sure most of them are much different from the others, but the Not Invented Here syndrome seems to be everywhere. These observations also remind me of just how vast the world is and how little I see of it on a daily basis.

I ran across an article recently on simplifying the way we build software. In this instance, the authors focused on the complexity of JavaScript frameworks, with a number of authors/developers/consultants saying they limit their use of frameworks and try to keep to simple technologies. There is a quote that I find myself agreeing with and disagreeing with at the same time from David Heinemeier Hansson: “The merchants of complexity will try to convince you that you can’t do anything yourself these days. You can’t do auth, you can’t do scale, you can’t run a database, you can’t connect a computer to the internet. You’re a helpless peon who should just buy their wares. No. Reject.”

I do think that many frameworks, products, etc. are very complex and there is a cost to using them. They can be overwhelming and hard to teach to others, and those costs can outweigh the benefits. The costs could be actual money or they could be the time it takes to learn, integrate, and teach the technology to others. Any change you make to your architecture will have impacts that you haven’t considered, and it will slow down your agility. Adding new technology does the same thing.

At the same time, some things are hard. Auth is hard. Lots of people struggle with this and Troy Hunt has made a good living at https://haveibennpwned.com/ pointing out all the poor decisions people have made. Including a bunch of them that use the auth provided by some tech stack. Running databases is hard. Can you run one? Sure. Will you run it well as your app grows? Experience shows me most of you can’t. Can you set up a simple backup strategy? Of course, but that doesn’t matter. You being able to restore things is what matters. There are plenty more examples of tech problems or failures that companies run into as they build and scale software. There are also plenty of examples where using a framework saves a bunch of time for development teams. Look at mobile apps, most of them are based on some framework because learning all the ins and outs of iOS and Android is too hard for most people.

For. Most. People.

DNN can do a lot of things that most people can’t. He’s very talented and smart. Lots of organizations have one or two very talented people, but they also have a lot of average people. They struggle to teach and train them things, and ultimately, they don’t want to spend time teaching everyone about how to manage CSS or auth a user from a service or deploy changes to a database. They want a framework that makes it simple for the ever-changing staff to get work done related to their business, which is more often than not building software. They need to build software, but they need the software built to serve their customers in some way. That’s their goal, serving customers.

I am all for keeping things simple. I want to minimize the number of technologies we use, and I want to limit the training time it takes to get people productive. However, I don’t want to build everything myself because I’m not sure I can, and I certainly know I don’t to go to that much effort.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on Keep. It. Simple.

T-SQL Tuesday #179: The Detective’s Toolkit

It’s time for the monthly T-SQL Tuesday blog party. This month a longtime friend, Tim Mitchell is hosting and he’s got a neat invite. He’s asking us how to dig into a request and figure out th meaning in our database, our files, or something else. How do you solve a data mystery?

If you haven’t participated in a T-SQL Tuesday blog party, start a blog and join us. If you’re written a post, host one of the months in 2025. Don’t get too caught up in the rules. At this point, I’m just trying to get

Inspector Gadget

2024-10_0157For some reason, the first thing that comes to mind is Inspector Gadget. My kids used to love this show and I can still hear the song and catch phrase in my head.

Being a detective doesn’t require gadgets, but they do make things easier to use. As Tim asks in the invitation, how do you go about learning about your data?

One of the main things I’ve done in the past is use Notepad++ to load a large text file and look at it. These days VS Code seems to work well for my scales, and I can get an idea of the structure of text files.

If I’m looking for meaning in a database, often I need two things:

  1. An Extended Events session
  2. A business analyst or end-user

Usually I only look at data in databases when someone asks me to find something or explain something. Often, this means relating something in an application to a data model that might not be intuitive, obvious, or even rational.

The big way for me to figure out where data is stored is to get a user to show me something in the application, or tell me what data they see from their side. Then when they query something, I can trace what happened with xEvents and see what tables were queried.

The does most of the detective work for me, though sometimes I’m doing some trial and error as any specific screen in an application, might query lots of tables.

Posted in Blog | Tagged , | Comments Off on T-SQL Tuesday #179: The Detective’s Toolkit

Moving T-SQL Tuesday to Jekyll

I got a message a few months back that Microsoft was deprecating the MySQL server version that I was using in Azure. The cost was going up, and while I don’t mind donating the $12-15/month to the site, I didn’t want this to be really expensive.

It’s also a perfect fit for a static site as the content rarely changes and doesn’t really need to have database access to server pages. Jekyll, which powers a number of others sites (SQL Saturday, SQL Memorial) seemed like a good fit.

This covers a few of the things I did to move the site, which was surprisingly easier than I expected.

Exporting the Data

There is a plugin on the WordPress marketplace that you can add to your site to export your data as markdown for jekyll. I tried to add it to the live site, but the plugin didn’t want to activate. My PHP was old.

I could upgrade PHP, but I might break something, and ultimately, I don’t need another job.

Instead I decided to move the site local. That’s easy with a container, and I found this post on running WP in a container. In a couple of minutes after getting the Docker image downloaded, I had a new WP site running on my laptop. With the latest PHP and WP versions.

2024-08_0026

While the image was downloaded, I used the main WP export to download a copy of data.

2024-08_0029

The next step was to import this locally. Since I had a brand new WP site in a container, the easiest thing is an import, rather than a database restore. I’m a database pro, but I prefer simple and efficient processes.

2024-08_0030

I picked the export file and let it run.

2024-08_0032

Once I ran the import, I could see the same site in my local container as I see at tsqltuesday.com. Now I need a plugin. When I searched around, the Jekyll exporter was recommended, but I couldn’t install this on older version or WP.

However, in my container, this worked fine.

2024-08_0027

Once the plugin was installed. I could see it in my list.

2024-08_0028

I also had a new menu item under my Tools, which was export to Jekyll. Once

2024-08_0034

Once I picked this, I got a file to download (from the container to my local drive), and all of my files were zipped up.

2024-08_0035

Inside of this file, I had a bunch of folders, which were structured for Jekyll. The _posts is the main folder where the various posts from your WP site are stored. As you see here, all the files are .md, which are markdown files. The _posts are the same format.

2024-08_0036

Now that we have a set of Jekyll compatible files, let’s get this moving to GitHub.

Loading into Git

The first step was to create a repository on GitHub. I’m moving this to the SQL Saturday organization, so it can be a part of the charitable foundation (and I can hand it to someone at some point).

I created the repo and then downloaded an empty repo to my local drive. If you want to help improve the site or change things, fork this repo and then you can have your own URL.

2024-08_0038

I then copied my exported structure into this folder, but I had some issues rendering with the default GitHub themes. So I deleted everything and then grabbed a fork of Beautiful Jekyll, which is the same theme running the SQL Saturday site. Essentially I copied all of the Beautiful Jekyll files to this local repo, then copied the T-SQL Tuesday export on top of those files and committed the files.

In GitHub, I also added the automation for Actions by grabbing the workflows folder and added that. In the pages setup, I then enabled pages, which gave me a .github.io URL, which is fine for testing.

2024-08_0039

This worked, as once I committed code, I saw a build.

2024-08_0040

And it worked well.

2024-09_0116

Sort of. The theme stuff didn’t quite work well, so I ended up copying over the SQL Sat site with the Beautiful Jekyll theme and then deleted posts and added back the T-SQL code.

I had some work to get the site working well, and when I did, I moved it. I created an official repo here where people can send PRs if needed: https://github.com/sqlsaturday/tsqltuesday

2024-08_0037

Once I had things here, I had a little DNS work to do for the custom domain, but T-SQL Tuesday was again live at https://tsqltuesday.com/

And with an SSL cert, thanks to GitHub. However, I wasn’t done.

Data Cleanup

While I was hoping the plugin would convert things easily, I realized quickly there were a bunch of problems. Not a crazy number, but some.

The data cleanup was a manual process. I could perhaps have found a way to parse and automate this, but with 174 invites, this just wasn’t enough work for me to spend time building and testing something. Plus, I needed to get this done before the next month.

First, I had a bunch of posts converted with long WordPress like URLs. That’s fine, but I had inconsistency as some posts weren’t linked in this way. I also had some links that were absolute, using the https://tsqltuesday.azurewebsites.net/ URL as a base.

I decided to simplify things.

First, I added a permalink to each page, similar to what you see in this one: https://raw.githubusercontent.com/sqlsaturday/tsqltuesday/refs/heads/main/_posts/2009-12-01-t-sql-tuesday-001-datetime-tricks.md

I set this to be /xxx, where xxx is the invite number. While this might be a problem if we get to 1000, I certainly won’t be dealing with that one.

2024-09_0128

Next, some links were absolute and pointing to the old site, which was on Azure. I changed these, mostly in the Host Index, to relative links using a simple formula for the posts, which was a /xxx, where xxx is the invite number. This is a simple, clean way of managing the links.

Lastly, I had issues with some individual posts, where I’d see things like this:

2024-08_0120

That HTML wasn’t rendering inside the markdown correctly, so I had to delete these items. And, of course, the end tags.

2024-08_0121

I also had issues where people would do something like Steve (twitter | blog) and the pipe character was seen as a table indicator. I had to change those to /.

Savings

The old costs, which usually came out of my MVP subscription, were about $12/mo for the db and $9 for the site. The requirement to move to a Flex server increased the db to about $23/mo.

The new Github site costs $0. Perfect for a charitable endeavor.

Posted in Blog | Tagged , , | 2 Comments

The Load of Real Time Data Warehouses

If you have a data warehouse, what do you think your ratio of reads to writes is on any given day? Do you think 1:1, as in one read for each write? Is it 10:1, with 10 reads for each write? 100:1? Do you track this in any way?

One would think that most of the databases we work on in the transactional world have many more reads than writes. I’d have assumed the ratios might be higher for data warehouses, where we load data that is queried (read) as the primary use case. After all, I expect that there are lots of people querying data that is loaded into this warehouse, with relatively few changes.

I saw a presentation recently of a paper from Amazon that analyzed Redshift workloads (Why TPC is Not Enough). Redshift is the Amazon AWS data warehousing platform that has proven to be very popular in many companies as a cloud warehouse. One interesting piece of analysis is that there are more reads than writes, but barely. There is a lot of insert/update/delete activity, which is different than the TPC benchmarks for warehouses (TPC-H and TPC-DS). Those benchmarks tend to be 75+% reads. Hence the paper and analysis.

However, Redshift isn’t like that in the real world. I had a chance to chat with one of the authors, and with another ETL specialist and they noted that the current pipelines that many companies use aren’t running in batch mode, but run in constant mode, moving data almost constantly. Combining this with relatively few queries from smaller groups of people results in a fairly close ratio of 1:1 for reads to writes. That was quite surprising to me.

Presumably, some of this is because people will run relatively large queries for a report, and then spend time analyzing the data while the ETL/ELT processes continue to move data to the warehouse. Much of the design for storing data in warehouses, especially columnar ones, is with the idea that the data doesn’t change much. Certainly, columnstore indexes perform better when they are being read, not necessarily when they are updated.

I wonder how much of this architecture of constant writes and updates has driven the world towards a Lakehouse architecture where data is written out and then transformed into cleaner versions that are copies, albeit some of them smaller than the original. Often this data is also written to separate delta/parquet files as well, which means new writes usually occur to discrete objects rather than a managed database structure. From a logical thought standpoint, that seems to make more sense to me, even though I still tend to think most of us could use a relational structure for warehousing data.

If you have a real-time warehouse being updated, perhaps you want to measure your ratio of reads to writes and possibly rethink how to manage the system. If you don’t have a specific warehouse, which is most of us, you probably like querying the real-time updates of your transactional data in the same place as it’s stored. Either way, it’s interesting to think about the impact of a real-time workload on the performance your users experience.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on The Load of Real Time Data Warehouses