Words vs Data

I would guess that most of you reading this are very comfortable looking at data for insights and answers. You might even prefer to provide a result set instead of a picture or chart to a user when they are asking for help with data analysis. However, do you add any words to your analysis to help? Any descriptions, summaries, or conclusions that could be drawn from the data or the picture?

I ran across a blog asking about the right ratio of words to data. The post uses the childhood story of Goldilocks and the Three Bears. Many of you might know the story and have drawn your own conclusions of what the story shows or means. If you read this post, you will find a very different interpretation. While some of you may not think that’s a valid interpretation, it’s possible that some thought that when they first heard the story.

The point of the post is that we can provide data and pictures, but others might interpret things differently. Each of us has our own point of view, our own experiences, and our mood. That last one might lead us to focus on a piece of data or a part of the picture that the author didn’t intend for us to focus on, or didn’t think was relevant. Without any sort of guidance on the narration from the author, we don’t know how closely our interpretation matches theirs.

Many of us have certainly seen others spin data, especially aggregates and statistics, to suit a narrative. However, the idea of providing some narrative isn’t to hide or mislead, but rather give context to what you see in the report. As the blog notes, don’t leave their interpretation to chance. Give them a “well-crafted, objectively reasonable narrative that is supported by your data.”

Or, if you don’t have one, let them know that and ask them to send you one back showing what they see or what they expect.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on Words vs Data

AIs Struggle with Family

This post looks at a logic problem asked to an AI, and which is struggles at finding an answer. This is based on an article noting this question stumps most AIs. The article is at: https://futurism.com/logic-question-stumps-ai

This is part of a series of experiments with AI systems.

The Setup

I loaded a Deepseek model to compare with my Mistral model. I did this with the following commands:

docker run ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull deepseek-r1:1.5b
docker exec -d ollama ollama run deepseek-r1:1.5b
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v ollama-webui:/app/backend/data --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main

From there, I had a website.

The Logic Problem

In the article, we ask a question.  I asked: >Alice has [5] brothers and she also has [3] sisters. How many sisters does Alice’s brother have?

The response was interesting, using the Deepseek-r1:1.5b model. Here it is, with the entire response in Italics.

First, determine the total number of children in the family by adding Alice’s brothers (5) to her sisters (3). This gives a total of 8 children.

Next, since Alice is one of those 8 children and she has 5 brothers, subtracting Alice from the total gives the number of brothers as 4. However, this calculation leads to a discrepancy because it doesn’t align with the original information provided.

Recognizing that there might be an error in the initial assumption, consider that Alice has 3 sisters, meaning there are at least 4 girls (including herself). Adding her 5 brothers results in 9 children, which still does not match the given numbers.

Finally, re-evaluating the problem, it’s clear that without additional context or information, determining the exact number of sisters a brother has is not possible with the provided details.

Solution:

Let’s break down the problem step by step.

  1. Total Number of Children in the Family:

    Alice has:

    • 5 brothers

    • 3 sisters

    Since Alice herself is one child, the total number of children is:

    [
    \text{Total children} = \text{Number of brothers} + \text{Number of sisters} + 1 = 5 + 3 + 1 = 9
    ]

  2. Number of Brothers:

    Alice has 5 brothers in her family.

  3. Number of Sisters Alice’s Brother Has:

    Since the total number of children is 9 and there are 5 brothers, we can determine the number of sisters each brother has by subtracting the number of brothers from the total number of children:

    [
    \text{Sisters per brother} = \frac{\text{Total children}}{\text{Number of brothers}} = \frac{9}{5}
    ]

    However, this division does not result in a whole number, which suggests that the problem might be interpreted differently.

  4. Alternative Interpretation:

    Considering that Alice has 3 sisters, it implies there are at least:

    • 1 (Alice) + 3 = 4 girls

    Including her own siblings, if each brother has an equal number of sisters, we can find the exact number by dividing the total number of children among all siblings:

    [
    \text{Sisters per brother} = \frac{\text{Total children}}{\text{Number of brothers} + 1} = \frac{9}{5 + 1} = \frac{9}{6} = 1.5
    ]

    This result still doesn’t make sense in the context of having whole sisters.

  5. Conclusion:

    Given the information provided, it’s not possible to determine an exact number of sisters that each brother has because:

    • The total number of children is fixed at 9.

    • Each brother has a different number of sisters based on their position in the family tree.

    Therefore, with the current details, there isn’t enough information to definitively state how many sisters Alice’s brother has.

\boxed{\text{Cannot determine}}

That’s interesting, but is this a Deepseek problem? Let’s try another.

Mistral

Mistral is the first model I played with locally. Let’s ask it the question. This response was shorter, so I’ll add it as an image:

2025-01_0090

It seems the AIs don’t quite understand family, and they’re making simple mistakes in putting things together. I could see humans making this mistake, but they’d catch it quickly if someone pointed something out.

Let’s try one more thing.

What’s a Family?

I then ran this prompt: In a family, if I say Alice has 2 brothers and 1 sister, how many children are in the family?

Easy, isn’t it? If I have 3 siblings, whether brothers or sisters, then there are 4 kids. How does the AI do?

2025-01_0094

I suppose Alice could be male (Alice Cooper), but even in that case, the AI seems to struggle to separate out Alice from the counts. For some reason, even listing the siblings out it counts 5 rather than 4. I guess 2+2=5 (for extremely confused AIs).

Conclusion

This isn’t to imply the AIs aren’t useful or helpful, but rather they aren’t intelligent in a human sense, or even logical in a human sense. They are very well trained search engines that can put things together in a way that we can’t easily program. They can predict the way to approach problems with mimicry of how humans think.

However, they don’t have common sense. They are just working with patterns that are very complex, but aren’t intelligent.

They are useful and can help you solve problems, however, they do make mistakes. Knowing something about the problem you’re asking them to solve is important. These are tools to help us, not tools that replace us.

Posted in Blog | Tagged , , | 2 Comments

Data Debt

I had never heard of data debt until I saw this article on the topic. In reading it, I couldn’t help thinking that most everyone has data debt, it creates inefficiencies, and it’s unlikely we’ll get rid of it. And by the way, it’s too late to get this under control. I somewhat dismissed the article when I saw this: “addressing data debt in its early stages is crucial to ensure that it does not become an overwhelming barrier to progress.” I know it’s a barrier, as I assume most of you also know, but it’s also not stopping us. We keep building more apps, databases, and systems, and accruing more data debt. Somehow, most organizations keep running.

The description of debt might help here. How many of you have inconsistent data standards, where you might define a data element differently in different databases? Maybe you have duplicated data that is slow to update (think ETL/warehouses), maybe you have different ways of tracking a completed sale in different systems. Maybe you even store dates in different formats (int, string, or something weirder). How many of you lack some documentation on what the columns in your databases mean? Maybe I should ask the reverse, where the few of you who have complete data dictionaries can raise your hands.

For most of my career I’ve heard a couple of terms that I’ve never really seen implemented. There’s the famous “single version of the truth” for a system, which seems to break down whenever we add a reporting or warehousing system. Even inside a single database, often an OLTP one, it’s hard to get a truth because values are changing so fast. The other term is MDM (master data management), which promises to ensure that every element is tracked and tagged the same way. No misspelled customer names or outdated addresses. There have been no shortage of products I’ve seen to help people tackle this problem, but ultimately I think the amount of data debt is too high. When we realize we need MDM, we’ll never pay down that debt, mostly because too many developers have too many habits and legacy ways of capturing data that will never get integrated into any MDM dictionary.

The article seems like a great academic set of principles. Make sure you label all your data. Put governance in place, with good access controls. Train workers, establish accountability to properly manage data. Invest in scalable architectures. How many of you can add scale to your system easily? It’s always taken me jumping through a variety of hoops to do that. The cloud makes it easy.

For a month. Then when the bill comes, you’ll be scaling back down.

Really, the chaos of the real world, where organizations are not one thing, but a large number of people and groups, each with their own goals and processes, just trying to get enough done to keep the organization moving forward is where we live. There’s no real time to deal with data debt.

Except if you’re the ETL person. We mostly pay you to move data around and clean it as best you can. At least then the problem remains hidden from the report readers, who trust you’ve actually done the T portion of ETL correctly.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | 1 Comment

Advice I Like: Why am I doing this?

You are never too young to wonder “Why am I still doing this?” You need to have an excellent answer – from Excellent Advice for Living

I’d say that you’re also never too old to ask these questions either. When you realize that you’re doing something out of habit or familiarity and not for other reasons, I think it’s worth re-evaluating the situation.

It’s very easy to get into a rut and keep doing those things, even things you don’t like, from habit, obligation, or just because.

Don’t do that. Instead, make sure you actively choose to do things. Things you want, things that you are obligated to do, committed to, or even things that matter to you. Don’t just keep doing something for no reason.

I’ve been posting New Words on Fridays from a book I was reading, however, a friend thought they were a little depressing. They should be as they are obscure sorrows. I like them because they make me think.

To counter-balance those, I’m adding in thoughts on advice, mostly from Kevin Kelley’s book. You can read all these posts under the advice tag.

Posted in Blog | Tagged , | Comments Off on Advice I Like: Why am I doing this?