Big Data or Small Data

I went to San Francisco for Small Data SF, a conference sponsored by Mother Duck. The premise of the event was that smaller sets of data are both very useful and prevalent. The manifesto speaks to me, as I am a big fan of smaller sets of data for sure. I also think that most of the time we can use less data than we think we need, especially when it’s recent data. That often is more relevant and we end up with contorted queries that try to weight new or old data differently to reflect this. Maybe the best line for me is this one:

Bigger data has an opportunity cost: Time.

I think time is a very valuable commodity and large sets of data can slow you down. There’s also the chance that looking at too much data starts to blur the lines of understanding. We may start to miss information in our dataset, or we may find people arguing about different things the data means, because we have so much data that we can find support for any position somewhere in the vast sea of numbers, strings, and dates.

Big data also has a real cost in resources, often money. One of the examples was from the organizer, who once gave a demo on stage, querying a PB of data.  That’s impressive, and lots of us would want to be able to query our very-large-but-less-than-PB-sized data in minutes. However, the thing that wasn’t disclosed in the demo was the query cost over USD$5k.

I’ve heard from a number of customers and speakers that most people don’t have big data. Most of us have 100s-of-GB-sized working sets of data, sometimes with TB-sized archives in the same database that slow everything down. If we could easily extract out the useful data, we could query those hundreds of GB more efficiently.

This is especially true in the era of small devices that can handle something close to a TB of data in a small form factor. With some of the columnar systems that compress data, a TB of raw data might be substantially compressed in Parquet files or an analysis system like DuckDB. In that case, we might realistically search and analyze 1TB of data on a laptop.

I know that big data is relative, but many of us face challenges with data sizes and query performance. I know lots of you embrace the challenge and see working with TB (or larger) systems as a badge of honor. I also know the reality is that most of us struggle to separate our archive data from current working data in our systems. However, if we could, would most of you want to work with smaller data sets or do you enjoy large ones? I know which way I lean.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on Big Data or Small Data

My 2024 in Data: Reading

This is my last week of the year working (I guess I come back on the 30th for a minute), so I decided to do some analysis of my year. I like data and numbers, so I’m looking at a few aspects of life this year with data I’ve compiled from previous years.

Today I’m looking at the books I read. I read a lot to get away from my chaotic life and this has been one of the things I’ve tracked for many years. I use GoodReads to track my reading, and I’ve done this since 2018. I used to do some book reviews or track counts on my blog, but I gave that up as Goodreads has great Kindle integration.

The Numbers

The numbers are constantly in flux, as I never stop reading, but as of this writing (12/16), here are the totals by year:

  • 2024 – 131  – updated 12/16
  • 2023 – 111
  • 2022 – 105
  • 2021 – 118
  • 2020 – 82
  • 2019 – 128

This was my biggest year, likely because of all the time I spent in airports and on planes. I tend to read a lot during those stretches.

A book isn’t a book, and I’m not sure how to easily get a count of pages read. Even then, with forewards of different lengths, citations/references/etc. it’s hard to know if I read more of less this year. However, from Goodreads, I could get this:

  • 2024 – 51753
  • 2023 –  34157
  • 2022 –  29517
  • 2021 –  29058
  • 2020 –  22932
  • 2019 –  42349

More pages in books this year.

I tend to read a lot of series, and this year, I had 18 authors of whom I read multiple books. I don’t have a series grouping, but these are the top authors:

  • J.N. Chaney    15
  • Douglas Pratt    10
  • Jeffery Deaver    9
  • Jack Slater    9
  • Steven Konkoly    9
  • Stephen Taylor    8
  • Harlan Coben    6
  • Robert Dugoni    6
  • Brad Lee    6

Harlan Coben has a series (Myron Bolitar), but I think most of these were standalone books. Most of the Deavers were either the Colton Shaw series or ones from Lincoln Rhyme that I hadn’t read. The Douglas Pratt, Logan Ryles, Steven Konkoly, and J.N. Chaney series were new ones I discovered. I ripped through quite a few of these in trips, storing them in my Kindle app and reading offline.

  • Devices used: 3

I only used my computer a few times to read during lunch, but almost all my reading is on my mobile phone. It’s a 6”+ screen and it works well. The third device was also a phone, which I replaced this year.

I tried to use a small Kindle device in the past, but my daughter appropriated it last year and I haven’t used it since. I don’t miss it and while I carried a Kindle Fire a few times on trips, I ended up barely using it for video and never for reading.

Genres

I tend to read mostly fiction, as this is an escape for me. However, I do try to keep a few business/career related books in play. This year, by my count, I read these genres:

  • Thillers: 81
  • Sci-fi: 19
  • Mystery: 17
  • Legal: 6
  • Nonfiction: 4

The nonfiction books were mostly for work, though I did read Geddy Lee’s bio, which I enjoyed. I roughly categorized these by author, so keep that in mind. Some of the thrillers likely fall into mysteries or something else, but I wanted just a quick look at my world.

The Highlights

I’ll recommend a few different items. I read a lot from my Kindle Unlimited subscription as well as a local library borrowing. If you buy from a link, you’re funding my reading habits.

I’d recommend all the non-fiction books I read, which were these:

I got lucky as I have a couple others in progress that I am not sure I’d recommend. In the sci-fi area, I really enjoyed the Sentenced to War series. It’s 15 books, so a nice long time to learn the characters. The main one is a little immature, but we watch him grow up, which was neat.

I’ve also been trying to catch up on the Spinward Fringe series. I loved Hunters: Broadcast 16, but if you want to start, Broadcast 0: Origins is free.

In the thrillers area, The Never Game caught my eye. It’s the Colton Shaw series from Deaver, and the basis for a new series on CBS. I enjoyed these 4, though the short story wasn’t great (1.5). These got me to look back for Lincoln Rhyme books I’d missed.

I tried some Crighton’s I’d missed, but didn’t love them. I did, however, really enjoy the Prosecution Force series, book one is Brink of War. It’s a crazy tale, but I enjoyed the character and how he can’t quite trust anyone. I need to get a few more from this author.

In the legal area, I’ve enjoyed the books of Robert Dugoni. This is an area I’ll get stuck in sometimes, usually with a new (to me) Grisham novel, but in this case, the Trace Crosswhite series was good last year and I liked the David Sloane series this year. The first is The Jury Master.

That’s the year in reading, which was a fun one for me. Who knows where next year will go. Right not I’m in a bit of a sci-fi channel, with 2-3 business books on tap.

Posted in Blog | Tagged , | Comments Off on My 2024 in Data: Reading

My 2024 in Data: Speaking

This is my last week of the year working (I guess I come back on the 30th for a minute), so I decided to do some analysis of my year. I like data and numbers, so I’m looking at a few aspects of life this year with data I’ve compiled from previous years.

Today I’m looking at my speaking efforts in 2024. Since I finished all my commitments in November, this is an easy post to write.

Speaking Stats

Here are the gross stats for public speaking, as free and paid events.

  • Events: 37
  • Talks: 68
  • Virtual talks: 7
  • free, community events (non UGs): 9
  • user group talks: 3
  • Redgate events: 20

This was also a year when I spoke at two events that repeated during the year. THAT! Conference was in Austin and Wisconsin Dells and I was honored to go to both. I was also at both of the Denver Dev Days (May and Oct).

Lots of these required some trips, but some were local or remote, so this didn’t feel like a big speaking year.

In terms of sessions, I delivered 19 different talks throughout the year, with a number of repeats. I like having 5-6 current talks that I can repeat at different events as this helps me practice them and deliver a smoother performance.

Most of my talks with smaller, < 100 audiences, but I did have a few keynotes this year with some audiences in couple of hundred attendees. Maybe my largest talk was to about 300 people + hundreds online at DevOps Days Minneapolis.

Many of these events were great, and I’m looking forward to more in 2025. A few that I’m hoping to get back to in 2025:

  • VS Live (none in 2024 other than Live 360)
  • THAT Conference
  • DevOps Days (either Minn or elsewhere)
  • SQL Sat Baton Rouge – one of my favorites and I hope to get back in 2025.
Posted in Blog | Tagged , | Comments Off on My 2024 in Data: Speaking

My 2024 in Data: Travels

This is my last week of the year working (I guess I come back on the 30th for a minute), so I decided to do some analysis of my year. I like data and numbers, so I’m looking at a few aspects of life this year with data I’ve compiled from previous years.

Today we look at travels, which is something I do a lot each year.

The Report

Let’s look at my Power BI Report. Quite the year, traveling to three countries and spending nights in 35 cities. My home city/country are included, so the next is other places. As you can see from the map, I was lucky to get to Australia, the UK, and Italy.

2024-12_0145

One of my favorite spots this year was Montepulciano, where my wife and I spent a relaxing day.

2024-12_0146

A few other numbers (ignoring home):

  • US States visited: 15
  • Nights in hotels: 119
  • Nights in hotels not for work: 36
  • Trips: 26
  • Flights: 68
  • Overnight train trips: 1

The non work hotels are likely a bit low as my wife was with me and we extended some work trips by a day and relaxed. Otherwise, I’d have likely tried to miss a few hotel nights and get home.

I have no idea how many miles I traveled, but it was a lot. I had some long trips this year as well, which were made easier (for most of them) with my wife coming along. Australia was 19 days and the UK/Italy/UK was 14 days.

I also had some crazy stretches, which were hard, but fun trips. The first was this turnaround:

  • Sun – Wed – London, UK for an event
  • Thur – Cambridge
  • Fri – fly to Colorado
  • Sat/Sun – Colorado, coaching
  • Mon – fly to Sydney, AUS

My body was a bit messed up after that one. The next one was a easier.

  • Fri – leave Melbourne to fly home
  • Sat-Sun home
  • Mon – fly to NY

Another almost crossing half the world trip. The last one was also a bit crazy.

  • Wed – Fly to Dallas
  • Thur – event + Fly back to Colorado
  • Fri – Fly to Syr
  • Sat – speak, fly to New Jersey, then London overnight

That last one was OK since I spent 5 days in Cambridge and then a week in Tuscany with my wife. Glad I went home for a night though.

All in all, this was one of my biggest and not biggest travel years. I had the same number of trips as last year, but they were more spread out. No trips in Dec, and some nice stretches at home throughout the year. Last year was more hectic with lots of small trips that kept me leaving home almost every week for a lot of the year.

I also didn’t see as many countries this year, and no new ones, but we had some memorable and amazing vacations, which made a better year.

I also crossed 1 million flight miles on United this year, which is a nice milestone for someone that travels a lot. My wife gets to share my status, which is nice as she often travels solo and meets me in places.

Looking forward to 2025, though I have no idea what to expect for travel.

Posted in Blog | Tagged , | Comments Off on My 2024 in Data: Travels