Thanks to everyone that came to my talks. Slides are below.
If you have questions, please reach out.
Thanks to everyone that came to my talks. Slides are below.
If you have questions, please reach out.
I can’t remember how I heard about Small Data SF 2024, but it caught my eye. The mix of sessions had me interested in going, especially with Mother Duck and Duckdb being the main sponsors. I’ve run into DuckDb a few times in the last couple of months, so I was interested in what I could learn about small data and a different group of people than I normally see at events.
When a customer visit cancelled, I requested the learning and development (L&D) time and budget and got it approved. I booked flights and a hotel and headed to San Francisco.
The structure of the conference was interesting to me. I’ve been lucky to get to a few smaller conferences in the past (100-200ppl) and I like them. SQL Bits, PASS Data Community Summit, and other events are nice, but I tend to like small events.
This conference had the tag line of Think small, develop locally, ship joyfully. There were other tags, and you can read their manifesto, but essentially this conference looked at the idea that lots of work with data (OLTP or analytics) can be done on small sets, with local databases or local data.
Day 1 started late, at 12p with lunch. I liked that, though I took advantage of the late start to sleep in and get a late breakfast, so I really wandered around and chatted with people with a coffee. Food was nice, as it always is in San Fran. Lots of dietary choices, and mixes of stuff. The event was in a co-working facility, so there were always snacks around (chips, nuts, fruit, etc.).
Day 1 was two workshops. Each was 3 hours with a break between them. Essentially these were vendor sessions for hands on work with a product. There was a happy hour after, but I skipped it.
My first workshop was from Mother Duck, a vendor building on DuckDB. The workshop was based on this github repo, and showed how to use dbt to move some data around. It was hands-on, and things worked well for me, but this was a mix of CLI work, python, database work, and more. Some people definitely struggled with the workshop. I found this interesting, and I learned a few things. I’m definitely interested in doing some DuckDB work to analyze data in a way that is different (and simpler) than Snowflake or Fabric. I could see people doing this.
The second workshop was from Outerbase, which essentially is a way to work with multiple databases on the web. It’s a light Object-Explorer/Query tool in some ways, but they’re also trying to do some AI work to help stub out a web interface for your database. They had us try to build some methods and web code that we could paste into a React or Angular (or others) framework. This one was OK, but I am not sure this is a great use of AI. I was hoping for a bit more.
Day 2 was all day, from 830-530. I arrived to find a lot of people getting breakfast. Again, hot food, cold food, GF, etc. Lots of choices. One cool thing was a coffee bar where you could get baristas to make a nice drink, but you could also get a Mother Duck mug for your drink. I have too many mugs, but I liked this one, so I got one.
This was a one track conference, which I also like. Everyone gets a shared experience, we have common things to talk about, and things change often. I also don’t have to go find rooms. In this case, one large rooms with a low stage.
New talks every 20 minutes, on a variety of topics. The agenda was wide and varied. I think a few talks were meh, but most were interesting. I’ve got some editorials coming, but the first talk on Big Data was great, as was the second one on different tooling we might use for both development and analysis of smaller sets of data.
Note, small data doesn’t mean kb or less. It notes that many queries can be run on GB of data on a laptop, and with today’s network and laptop capabilities, this can make sense. There was also some limited domain views of the ways you might shard your data to lots of databases, and you might do more local work, not central db connections. That makes sense in some cases, but not all.
I also think some of the speakers (quite a few startup people) minimize or don’t think about the true scale problems when workloads grow, nor about the hassles of pulling all this data together and synching it. In any case, I think their ideas work for some problem domains.
There were a few panels, as well as a presentation of a paper from Amazon. One super interesting thing was Redshift shows like a 60:40 split of reads to writes. That seems crazy. However, an exec from FiveTran talked about that matching their experience where many data warehouses are running constant updates from OLTP systems, something that their customers sometimes don’t realize. He wasn’t sure if this was a good idea as well, but it’s been good for their business.
As seems to be the case, there was a satirical talk on BI tools and how they don’t always help. An analyst for one of the political campaigns gave a funny humorous look at the world of vendors and customers.
After the last talk, there was a short happy hour, where I had the chance to chat with a few people. Silicon Valley is a strange place, full of people working in startups, formerly from startups, or wanting to start one. Everyone has a good idea, which I think is true, and so many of them want to chat about their thing or your thing.
As you might expect, a lot of people at AI-focused or thinking AI. It’s neat to hear their experiences and what they think. Certainly I saw some neat demos or using small models (again small data) and feeding a user query into the model along with some data from a database or a flat file. That was interesting and something I think could be useful in different ways that are focused. I expect more and more people to get comfortable with AI based work.
Ultimately, I had a nice, refreshing two days that got me thinking about data differently and how there are different ways to approach problems and solutions. Perhaps one of the neater things I saw was PySheets, Python in spreadsheets. Just don’t try it in Chrome, and make sure to use the little A* button to test the AI.
When is the last time you read an article/blog/etc. on the Internet and saw a button for a print friendly version? That used to be something on every page, and one people often shared on social media (or email) because it didn’t have all the advertisements in it. I remember having to help code this feature on SQL Server Central when we started as plenty of people wanted to print articles out and read them later. That desire led to Andy brainstorming that we should release The Best of books each year.
I was reading about how the Internet has changed many things in our lives and I thought about these links. I searched a number of places I visit often and there are no more printer links. I’m guessing with mobile devices and various save services, most people have gotten used to using digital technology to consume information?
I still print things at times, though fairly rarely. I don’t often consume anything on physical media anymore, including books. I’ve tried to read a few times on paper, but it’s inconvenient to me now. I have to remember to pack something or carry it, I need a light often, it just doesn’t work as well.
I rarely see paper in use in meetings anymore at all. Whether I’m at a Redgate office or a customer site, most people seem to have monitors, projectors, sharing apps, and more so paper is just rarely used.
At the same time, it’s not completely out of date. It works well and it’s simple. I see it used for announcements, for small handouts, signs, and menus. Quite a few of us don’t like the digital menus from QR codes and it seems most restaurants I’ve visited still create physical menus. Signs and announcements are the places I still see paper in use regularly. I will say I’ve seen a few people (a very few) using e-ink devices, which is something I’m tempted to use. I do find writing helps me remember things better.
The world continues to create more and more data, while finding more numerous and novel ways to disseminate it. For much of the time, the paperless office exists, and I see less and less use of paper for distributing information, but it hasn’t completely disappeared. Except, perhaps, from the web.
Steve Jones
Listen to the podcast at Libsyn, Spotify, or iTunes.
Note, podcasts are only available for a limited time online.
moledro – n. a feeling of resonant connection with an author or artist you’ll never meet, who many have lived centuries ago and thousands of miles away but can still get inside your head and leave behind morsels of their experience, like the little piles of stones left by hikers to mark a hidden path through unfamiliar territory.
Moledro is something I get mostly from music. I’m less interested or thoughtful about most visual arts, though perhaps media counts. In any case, the poetry of some artists creates a connection with me that sticks with me.
A few examples:
There are likely many more, but these stand out to me.
From the Dictionary of Obscure Sorrows