Advent of Code–Day 2

I’ve continued working along, and while I found Day 2 to be straightforward in Python and PowerShell, I thought it was tricky in SQL I decided this one was worth a post, since I had to futz around a few times to solve it, and I managed a slightly different way than some others.

If you haven’t solved it, then try. Come back here later and compare solutions, but give it a try first.

Solution coming below, so don’t scroll if you don’t need the solution.

But first,

Missteps

I had a misstep in this one. I loaded the entire list of packages as separate lines into separate rows into a single column table. My challenge to myself was not to use ETL work to break this apart, or updates. I wanted a simple solution, thinking I didn’t want to take up extra space in the database.

As a result, I wanted a single query from a single string column that had the package size stored as one column, ‘2x3x4’ as an example.

My first attempt used the Moden Splitter function, which seemed to work well. I got three rows for each package. I then used a WIndow function to grab that data, order by the sizes, and then start performing calculations. When I didn’t get the right result, I started digging in.

One of the first things I saw was that I had multple packages with the same sizes. So I had two 22x3x1 packages, and when I used a partition based on the dimensions, I had calculation problems. That’s because the window partition doesn’t know that three rows are one package and three are another.

I could have fixed this with some other value to capture the package, maybe a row_number even, but I decided not to go down this route.

My Solution

I decided to break this down, and I used a series of CTEs to do this. I haven’t gone back to optimize things, or combine CTEs, which is possible, but instead left the CTEs as I wrote them to solve parts of the puzzle. Multiple CTEs are east, and they help examine the problem in pieces.

My first step was to parse the string. I don’t love this solution as it is limited to a three dimension package, but it does seem to be the easiest way to break down the dimensions of the package. My query looks to find the string positions for:

end of the first dimension
start of the second dimension
start of the third dimension.

This gives me the simple query:

with cteSplit (d, el, sw, sh)
as
(
select
dimensions
, endlength = charindex(‘x’, dimensions) – 1
, startwidth = charindex(‘x’, substring(dimensions, charindex(‘x’, dimensions),20)) + charindex(‘x’, dimensions)
, startheight = len(dimensions) – charindex(‘x’, reverse(dimensions)) + 2
from day2_wrappingpresents d
)

Once I had these values, a little math gives me the length, width, and height.

, cteDimensions
as
(select
   d
   , l = cast(substring(d, 1, el) as int)
   , w = cast(substring(d, sw, sh-sw-1) as int)
   , h = cast(substring(d, sh, len(d)) as int)
from cteSplit d
)

Now I’m in business. These two queries were fairly simple, despite all the nested functions. I’ve got integers with the dimensions of each package.

Now the tricky part. I want these ordered. They’re columns, not rows, and I can’t put an ORDER BY in the CTE, so I need to use some comparisons.

, cteOrder
as
( select
   d
, small = case
            when l <= w and l <= h then l
            when w <= l and w <= h then w
            when h <= l and h <= w then h
        end
, middle = case
            when (l >= w and l <= h) or (l <= w and l >= h) then l
            when (w >= l and w <= h) or (w <= l and w >= h) then w
            when (h >= l and h <= w) or (h <= l and h >= w) then h
        end
, large = case
            when l >= w and l >= h then l
            when w >= l and w >= h then w
            when h >= l and h >= w then h
        end
from cteDimensions
)

Not the prettiest code, and perhaps there are better ways to determine this, but this passed all my tests, and seemed to work.

I could have put the next part in the final query, but I decided to make this a separate CTE to easily read the math. I know some people don’t like lots of CTEs, but in this case, I think they make the query very readable. I should look back at this in six months and see what I think.

, cteFinal
as
(
select
d
, area = (2 * small * middle) +
(2 * small * large) +
(2 * middle * large)
, slack = (small * middle)
from cteOrder
)

Now I use a final outer query to sum things up.

select
sum(area + slack)
from cteFinal

The other thing I noticed here is that when I needed to solve the second part, I only had to change the math in the cteFinal to get the new values. It took longer to re-read the second part than to change the code and solve it.

I looked over how Wayne Sheffield and Andy Warren solved this in T-SQL, and I thought their approaches were interesting. I didn’t want to PIVOT or UNPIVOT anywhere, nor did I look at performance here. This runs so quickly, I’m not sure it matters, though I wonder if we were calculating across 1mm rows, would one be better?

I may look, but for now, I’ll leave that to someone else.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31