2020 Advent of Code Day 6

This series looks at the Advent of Code challenges.

As one of my goals, I’m working through challenges. This post looks at day 6. I’m going to do this one in Python here, though I did solve it in other languages in my repo.

Part 1

This is another weird string grouping issue. The data load is a mess, meaning that there are groups of data I need to consider, and the groups are separated by blank lines. However, each group has multiple lines.

Ugh.

Easier in Python, where I can load the data line by line and break things. I do that with this code:

for answers in open("2020\day6\day6_data.txt").read().split("\n\n"):

In SQL, it’s harder. I bulk load into a table, the cursor through the data.

DECLARE pcurs CURSOR FOR SELECT lineval FROM Day6 ORDER BY linekey;
DECLARE
     @val VARCHAR(1000) = ''
   , @groups VARCHAR(1000);
OPEN pcurs;
FETCH NEXT FROM pcurs
INTO @val;
SET @groups = '';
WHILE @@FETCH_STATUS = 0
BEGIN
     IF @val > ''
         SELECT @groups += ' ' + @val;
     ELSE
     BEGIN
         INSERT dbo.Day6_Groups (groupanswers) VALUES (@groups);
         SET @groups = '';
     END;
     FETCH NEXT FROM pcurs
     INTO @val;
END;
     INSERT dbo.Day6_Groups(groupanswers) VALUES (@groups);
DEALLOCATE pcurs;
GO

Once that is done, I again have to do things differently. Python is easy, where I count the values and add them up.

answers = set(answers.replace("\n",""))
part1 += len(answers)

In SQL, I need to distinctly find the values, for which I need a function of some sort.

UPDATE dbo.Day6_Groups
  SET deduppedanswers =  DBO.REMOVE_DUPLICATE_INSTR(1,groupanswers)

Once that’s done the answer is the sum of lengths.

Part 2

More complex here. Now I need to match up the common answers among the groups. In Python, this isn’t bad. I use the intersection method to find out what matches between the groups.

for answers in open("2020\day6\day6_data.txt").read().split("\n\n"):
    matches = set.intersection(*[set(answer) for answer in answers.split()])
    part2 += len(matches)
print("Part 2: ", part2)

Fairly simple here, with the grouping of the answers in a set.

SQL is hard.I reloaded the data, and separated each group by a comma. This gave me data I could split up, keeping each group in a group.

2021-05-12 13_41_16-day6.sql - ARISTOTLE.AdventofCode (ARISTOTLE_Steve (56)) - Microsoft SQL Server

From here, I had a CTE for this, another to put these into groups of 2 strings by groupID. I then did a comparison for common characters across each group of 2 strings. This gave me partial matches, and I then compared all these in a group, taking the minimum number of matches. From here, I summed up the count of all the matches.

Posted in Blog | Tagged , , | Comments Off on 2020 Advent of Code Day 6

Daily Coping 24 May 2021

I started to add a daily coping tip to the SQLServerCentral newsletter and to the Community Circle, which is helping me deal with the issues in the world. I’m adding my responses for each day here. All my coping tips are under this tag. 

Today’s tip is to write down the memory of something good that happened in the last year.

I’ve had a lot of good memories in the last year. Some tough ones, but overall, a good year. I think one of the good things that went well last year was taking a few days to relax with my wife in the mountains. Without a lot of vacation travel, our mountain rental was open and we took advantage of it. We spent a couple days up there, and while we didn’t ski much, we did get to relax and enjoy time with each other.

Our lives are busy, and we don’t often get a few days completely off when we don’t have commitments. Finding 3 days that allowed us to read, relax, and unwind from life without animals, kids, work, and more was really nice.

Posted in Blog | Tagged , , | Comments Off on Daily Coping 24 May 2021

Back to Basics

This week I saw a post from Randolph West that covers binary and hexadecimal. It seemed basic and simple to me, but I’m old. I started learning about binary and hex in high school, where we used to tap our fingers to count in binary. Up was “0”, and down was “1”, and with one hand, you count to 31 (from zero). At the time, I thought it was important to know binary and hex to work with computers.

These days, I’ve rarely had to deal with binary and hex. They aren’t useless in programming, but they are much less used in these days of GB of memory and large data structures. We don’t encode things as often as we used to. Usually now hex just means I’m dealing with varbinary and need to cast it (or render it in some other way).

During the last few years, I’ve seen some efforts to get back to basics and publish more information that helps people get started or fill in gaps for areas that someone might have missed. Years ago Ed Leighton-Dick started the SQLNewBlogger challenge. I participated, and continue to write some basic posts every week or two. Grant had his database fundamentals series, and SQLServerCentral has published the Stairway Series for some time, giving people a way to get started in some new area of the data platform.

I think it’s important that we continue to cover the basics in all aspects of the data platform, as well as other areas. The world of technology is so wide, and new people are always starting their journey in their industry. Even for those of us that have worked with SQL Server or some other area for a long time don’t know everything about the platform. We often have as need to learn more about some niche and need some basics to help us. I think Azure Data Factory as a perfect example of this. Many people who worked with SSIS for years might need a few primers on the differences with ADF to understand a slightly new paradigm. Much of cloud computing needs basics, and constant updates to those pieces.

I know that the basics are always popular at SQL Saturdays and other conferences. The basics often get the most reads as articles, and I am hopeful that we will continue to see those with knowledge remembering that the basics are often the most important part of learning to build better software and systems for the future.

Steve Jones

Posted in Editorial | Tagged , | 1 Comment

Should You Learn R or Python?

I’ve been working on various skills over the last few years, trying to solve some simple problems in Python and PowerShell, in addition to T-SQL, to continue to improve my skills. It’s an interesting challenge at times, especially when I need to use new features or functions to which I haven’t had exposure in the past.

I also work with R lightly, as I need to build some Questions of the Day for SQLServerCentral and I try to alternate Python and R questions every week. This has caused me to dig in and try to learn more about the language and how to manipulate data.

Recently I was reading an essay from a consultant that works with clients using both R and Python. The piece talks about the differences and how these work to solve business problems. If you don’t want to read the entire thing, the comparison starts with the simple “you need both”, though there is more to the story.

The most interesting part of this for me was that the author notes that while these are good languages in different ways for data analysis, they aren’t great for data preparation and SQL is still required. Either a database like SQL Server or a platform like Apache Spark. Part of the reason is that R and Python aren’t very efficient, and as we work with more data and larger workloads, efficiency matters.

The other part of the piece I liked was the note that we need to collaborate and our work needs to be reproducible for others. I love having git for moving code around and keeping configuration files in a repository of some sort. It has certainly helped me take advantage of bits that others have written and easily reproduce their work on my system.

While some of us work with just SQL, I expect that we will get involved with other parts of projects and may need to help troubleshoot or improve code. I find both of these languages interesting and a nice complement to each other. I’ve also learned there are places where I much prefer one over the other, especially with some of the Advent of Code problems. Some are simple in SQL, but others are much more suited to Python. I haven’t tried them in R, but I bet some of them would be well suited to that environment.

If you have tried either, or have a preference, let us know. What are the advantages or disadvantages of each when you are working in a business?

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

Posted in Editorial | Tagged , , , | Comments Off on Should You Learn R or Python?