Voice of the DBA

Using Flyway with Feature Flags

Posted on May 22, 2024 by way0utwest

There is a nice article at Harness.io on their use of feature flags and how they deployed their next generation experience. It’s worth a read if you want to improve your database deployment experience, especially if you want to control how and when you release to customers.

A nice bonus is that they mention Flyway as a tool to help them manage database code changes. Hint from me is that you can use Flyway to not only help manage DDL changes but also DML changes as well.

I talk about this in my Architecting Zero Downtime Database Deployments talk. Most of the time when you have changes that might disrupt users, take a lot of time, or require coordination with different apps/systems, using feature flags and backwards compatible deployments is what makes zero downtime, or minimal downtime, possible. If your changes can’t be backwards compatible, you break the database changes into multiple steps that are backwards compatible.

Flyway makes it easy to ensure scripts run once, they are deployed as a group or individual transactions, and they run in order, so you can test your scripts in multiple environments.

I know I sound a little sales-y there, but Flyway is a fantastic tool and I was pro-purchase when Redgate bought the tool. I was also pushing to replace some other internally-developer deployment tech with Flyway in all our products, which we’ve done.

The big way Flyway supports feature flags is by ensuring you can break up your changes into separate scripts and limit when they run. This is best managed with branches in your VCS and PRs, but this also will be handled by the new Deployment Rules, which are in preview.

Posted in Blog | Tagged Flyway, Redgate, syndicated | Comments Off

Invisible Downtime

Posted on May 22, 2024 by way0utwest

This article has a concept I’ve never heard about: invisible downtime. This is the idea that there are problems in your application that the customer sees. Your servers are running, but the application doesn’t work correctly or is pausing with a delay that impacts customers. From an IT perspective, the SLA is being met and there aren’t any problems. From a customer viewpoint, they’re ready to start looking at a competitor’s offering.

Lots of developers and operations people know there are issues in our systems. We know networks go down or connectivity to some service is delayed. We also know the database gets slow, or at least, slower than we’d like. We know there are poor-performing code and under-sized hardware, running with storage that doesn’t produce as many IOPs as our workload demands. We would also like time to fix these issues, but often we aren’t given any resources.

The current buzzword among executives and senior IT leaders is observability. It’s the goal of looking at how our entire system, application, database, and network, are linked and performing with an eye on improving performance. Not because they want to spend time or money here, but because customers are becoming more fickle and quick to move to another offering. Leaders know that degraded application performance (another phrase for invisible downtime) can have real bottom-line impacts on revenue.

There are a lot of products in this space, application performance monitoring (APM), designed to look at lines of code and determine how well each is performing. They can help you spot issues in application code, but they lack insight into database and network details, at least at a level that the experts need. As a result, digging into performance issues and root cause analysis of problems usually means pulling data from multiple sources and correlating log entries.

This is likely an area where AI/ML technologies can help, especially across large estates, though I think in many cases, what we need is just a pointer to poor-performing code. C#, Java, SQL, whatever. We need to know where the bad code is and then we need to train developers to write more efficient code. That might be the best way to improve application and database performance.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged DevOps, observability, Performance | Comments Off

Writing Parquet Files – #SQLNewBlogger

Posted on May 20, 2024 by way0utwest

Recently I’ve been looking at archiving some data at SQL Saturday, possibly querying it, and perhaps building a data warehouse of sorts. The modern view of data warehousing seems to be built on using a Lakehouse architecture where data moves through different phases, but much of the data is stored in text files, often parquet files.

As a start to this I decided to try and move data to parquet. This post looks at writing parquet files.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Writing Parquet Files

In a previous post I looked at reading in JSON data, which is how some of my data is archived. I also talked about importing modules. There is a module, called pyarrow, that allows me to work with various parts of Apache Arrow.

One of the submodules in pyarrow is the parquet module, which lets me read and write parquet files. So, let’s get those modules.

import pyarrow as pa
import pyarrow.parquet as pq

I am giving these show names so I can refer to them in code. Now, let’s skip the code from the previous article and assume I’ve got a dataframe with my sessions in it. How do I get a parquet file?

Fortunately, I don’t need to know anything about the physical structure, as I can use the write_table() function from the parquet module to do that. I’ll also use the pyarrow.Table.from_pandas() function to get data from the dataframe into this module. This code does that (with some setup for a filename).

    outputFilename = f + '.parquet'
    outputFile = join(outPath, outputFilename)
    pqtable = pa.Table.from_pandas(df)
# Write Arrow Table to Parquet file
    pq.write_table(pqtable, outputFile)

Note: I don’t know the technical differences between how pandas dataframes and the pyarrow tables work. I found a few notes online and it looks like pyarrow tables can handle more complex data structures.

Once this code is added to the code from the previous article (it’s already indented), this will write .parquet files to the bronze folder underneath the location from where it is run. In essence, this takes data from the raw folder and writes it to bronze in a new format.

Summary

This post shows how to write parquet files out from JSON data. Take the previous article and this one and you can move data from JSON to parquet.

This code isn’t perfect. In fact, it needs work. I am only moving session data, so only a portion of the JSON data. This code should be enhanced, or the file names changed to reflect that, but for now, this is a quick example of producing parquet data.

SQL New Blogger

This post took about 10 minutes to write once I had the code working. In fact, adding these functions to the code from the last article only took a few minutes. I had to debug a few things to get the files into the correct folder, but it took longer to get these words down than get code working.

Not a lot longer, but longer.

You can do this. If you want to work in modern technologies, learn them. Learn how to work with parquet, which is being used a lot in data warehousing, and then write about it. Prove you can get things done and your current employer, or your next one, might give you a project to actually do this work.

Posted in Blog | Tagged python, SQLNewBlogger, syndicated | 3 Comments

Kubernetes is Cool, But …

Posted on May 20, 2024 by way0utwest

Kubernetes is cool, and I think it’s really useful in helping us scale and manage multiple systems easily in a fault-tolerant way. Actually, I don’t think Kubernetes per se is important itself; more it seems that the idea of some orchestration engine to manage containers and systems is what really matters. As a side note, there are other orchestrators such as Mesos, OpenShift, and Nomad.

However, do we need to know Kubernetes to use it for databases? This is a data platform newsletter, and most of us work with databases in some way. I do see more databases moving to the cloud, and a few moving to containers. I was thinking about this when I saw a Simple Talk article on Kubernetes for Complete Beginners. It’s a basic article that looks at what the platform consists of, how it works, and how to set up a mini Kubernetes platform on your system. It’s well written and interesting, but …

Do we need to know anything about it? Are we running databases in containers, or will we? I think it’s possible that we might run any of our databases in containers. They are like lightweight VMs and there isn’t a reason why we wouldn’t run a database in a container. With external storage, of course, which gives you a cluster-like environment where your storage moves to a new node if the first one fails. That’s a good use case. Deploying consistent environments quickly is a good use case. Using Kubernetes to manage the containers is great, but …

I don’t think we need to know much about Kubernetes. I don’t think most of us should run it and should outsource any container orchestration to the cloud if we decide to implement database containers. These orchestration engines are quite complex today, and there is a lot of expertise needed to manage them. I don’t know that expertise is worth trying to find, train, and retain for most organizations. We should just outsource the container management to someone else.

We might need to know how we change the configuration of some resources, but that’s minor knowledge, and really, I suspect that outsourced K8S (shorthand for Kubernetes) will have GUI tools that let you easily pick and choose the CPUs, memory, etc. and then an export of the JSON or YAML or whatever is needed for the config. Most of us likely need the skills to export, save (in a VCS) the files, and then submit them to the cluster.

A few years ago I went through a bunch of courses and reading material on Kubernetes. I set up some small clusters, I experimented with pods, I even was excited to think about managing containers for various services. What I discovered is that Kubernetes is complex, hard, and something I want someone else to run. Once I set up a cluster in Azure I thought I’d never want to do this on-premises again.

Much like email. I have run email servers, but …

I’d like to never run one again, which is how I feel about Kubernetes.

I think containers have proven more complex and harder to work with than many people thought. I know there are plenty of people using them, but it’s a minority. I see many more organizations still building monoliths, or microservices that run as processes, or client-server apps. Not that many people are excited and using containers. That may change, and if you go to the cloud, containers give you portability that many other solutions don’t, so I’d recommend them there. However, they are still a bit immature, and hard to manage. I think it will be a while before we see lots of databases on containers.

Even if we do, I’m not sure we need to learn Kubernetes as database people.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Posted in Editorial | Tagged Cloud Computing, containers, kubernetes | Comments Off

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Voice of the DBA

Using Flyway with Feature Flags

Invisible Downtime

Writing Parquet Files – #SQLNewBlogger

Writing Parquet Files

Summary

SQL New Blogger

Search this blog

2026 Redgate Summit – New York City

18 Year MVP Awardee

Tags

Search this blog

Steve’s Tweets

Older Posts

Meta

Recent Posts

Archives

Copyright Steve Jones 2018

Copyright 2016

Meta