Advent of Code 2017 Day 4–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

This is day 4 of the Advent of Code 2017. If you want to read about the puzzles, start with Day 1.

This is Day 4, which deals with checking a series of passphrases for duplicates. The idea is that for some input, each of the “words” in the string must be unique. This means that if I have a phrase of “dog cat horse”, it’s valid. If I have “dog cat bird cat”, it’s invalid.

This feels like an easy thing to do in SQL. Let’s begin.

First, I’ll build a table and load some data.

CREATE TABLE Day4
(
     passphrase VARCHAR(100)
)
GO
INSERT dbo.Day4
(
     passphrase
)
VALUES
    ('aa bb cc dd ee' )
  , ('aa bb cc dd aa')
  , ('aa bb cc dd aaa')

This is the test set. In this set, there are two values that are valid and one that isn’t. I need to figure out which is which.

This seems like a perfect place to use String_Split(). Since this is a small set (512 rows), performance isn’t a big concern. If it were, I’d be moving to use Jeff Moden’s string splitter.

For my tests, I decided to try something quick and dirty. I built a quick inline TVF to return string_split values. This is the code:

CREATE FUNCTION dbo.words
     (@input AS VARCHAR(100))
RETURNS TABLE
AS
RETURN ( SELECT value FROM STRING_SPLIT(@input, ' ') AS ss )

This made things easier for me to read. I like seeing the code clean, and this way I could easily most to a custom version of Jeff’s code if I wanted.

I quickly tried a short CTE with a CROSS APPLY to get the sets of rows that didn’t have unique passphrases. I thought this was easier since I know these are incorrect. If I look for corrects, it’s a harder issue as I’ll have counts of 1 in both correct and incorrect strings.

WITH ctePass
AS
(
SELECT d.passphrase 
        , ss.value, 
        cnt = COUNT(*)
FROM day4 AS d
     CROSS APPLY dbo.words(d.passphrase) AS ss
--WHERE d.passphrase = 'aa bb cc dd ee'
GROUP BY d.passphrase, ss.value 
HAVING COUNT(*) > 1
)

I also added a CTE that just gets me the total count.

, cteCount
AS
(SELECT total = COUNT(*) FROM day4)

Next, I added a CTE that would get my unique passphrases from the first CTE.

, cteUnique
AS
(
SELECT ctePass.passphrase
, cnt = COUNT(cnt) 
  FROM ctePass
  GROUP BY ctePass.passphrase
  )

Finally, I subtract the invalids from the total to get the vlaids. This gave me the answer that solved the puzzle.

SELECT total - COUNT(*) FROM cteUnique, cteCount
GROUP BY cteCount.total

Now let’s load the actual data. I cut and pasted data into a file. Let’s load that.

TRUNCATE TABLE day4
GO
BULK INSERT dbo.Day4 FROM 'e:\Documents\GitHub\AdventofCode\2017\Day4\Day4_Input.txt'
GO
SELECT 
  *
  FROM dbo.Day4 AS d

I can see I have 512 rows in my table, so my result must be between 1 and 512. When I ran this, I got the correct result.

A slight admission here. I misread at first, so I didn’t have the total and just got the 57. When that didn’t work, I re-read things and realized this. I tried to check for the valids, but realized that it was simpler if I just subtracted.

Part II

Part II is tricky, and I made a mistake the first time I tried to solve this. It took a bit, but I eventually figured out the issue.

In this part, we have to avoid anagrams. That means if the phrase as ‘car’ and ‘rac’, it’s invalid. Getting anagrams is tricky, and I looked around to find a post on SO that covers this. I adapted that code from Pawel Dyl to search for anagrams.

My first task was to put this into a function:

CREATE OR ALTER FUNCTION dbo.CheckAnagram
     (@w1 VARCHAR(50)
     , @w2 VARCHAR(50)
)
RETURNS int
AS
begin

declare @r INT;

WITH Src
AS
(
SELECT T.W1 ,
        T.W2
FROM
(
     VALUES
         (@w1, @w2)
) AS T (W1, W2)
) ,
      Numbered
AS
(
SELECT Src.W1 ,
        Src.W2 ,
        Num = ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM Src
) ,
      Splitted
AS
(
SELECT Num ,
        Word1 = W1 ,
        Word2 = W2 ,
        L1 = LEFT(W1, 1) ,
        L2 = LEFT(W2, 1) ,
        W1 = SUBSTRING(W1, 2, LEN(W1)) ,
        W2 = SUBSTRING(W2, 2, LEN(W2))
FROM Numbered
UNION ALL
SELECT Num ,
        Word1 ,
        Word2 ,
        L1 = LEFT(W1, 1) ,
        L2 = LEFT(W2, 1) ,
        W1 = SUBSTRING(W1, 2, LEN(W1)) ,
        W2 = SUBSTRING(W2, 2, LEN(W2))
FROM Splitted
WHERE LEN(W1) > 0
       AND LEN(W2) > 0
) ,
      SplitOrdered
AS
(
SELECT Splitted.Num ,
        Splitted.Word1 ,
        Splitted.Word2 ,
        Splitted.L1 ,
        Splitted.L2 ,
        Splitted.W1 ,
        Splitted.W2 ,
        LNum1 = ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L1) ,
        LNum2 = ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L2)
FROM Splitted
)
SELECT @r =  CASE
                   WHEN COUNT(*) = LEN(S1.Word1)
                        AND COUNT(*) = LEN(S1.Word2) THEN
                       1
                   ELSE
                       0
               END
FROM SplitOrdered AS S1
     JOIN
     SplitOrdered AS S2
         ON S1.L1 = S2.L2
            AND S1.Num = S2.Num
            AND S1.LNum1 = S2.LNum2
GROUP BY S1.Num ,
          S1.Word1 ,
          S1.Word2

IF @r IS NULL 
  SET @r = 0
RETURN @r
end
GO

Next, I wanted to get a list of items to check. Again, STRING_SPLIT is what I used, but a Jeff’s code works as well. I decided to use another function, which will split the string and then do a cross join to run both combinations of works against each other to check for anagrams.

CREATE OR ALTER FUNCTION dbo.SplitandCheck
(@string AS VARCHAR(100))
RETURNS int
AS
BEGIN
DECLARE @r INT = 0;

WITH mycte
AS
(
SELECT checks = CASE WHEN a.value = b.value then 0
ELSE dbo.CheckAnagram(a.value, b.value)
end
FROM STRING_SPLIT(@string, ' ') AS a CROSS JOIN STRING_SPLIT(@string, ' ') AS b
)
SELECT @r = SUM(a.checks)
FROM mycte a

This seemed to work, but when I got the final result with this code, it was wrong. Too high.

WITH mycte
AS
(
SELECT d.passphrase, IsAnagram = dbo.SplitandCheck(d.passphrase)
FROM dbo.Day4 AS d
)
SELECT mycte.IsAnagram, COUNT(*)
FROM mycte
--  WHERE mycte.IsAnagram = 0
GROUP BY mycte.IsAnagram

I had checked for 0s, but also included other results to try and debug my code. As I went through here, I realized that some fo the input data had a string like “oicgs rrol zvnbna rrol”. Clearly “rrol is an anagram of “rrol”. Initially I was knocking those out as a cross join includes those anyway.

As a result, I added this code to my function.

IF @r = 0
AND EXISTS(
SELECT COUNT(*)
FROM STRING_SPLIT(@string, ' ') AS ss
GROUP BY ss.value
HAVING COUNT(*) > 1
)
SET @r = 1

This will check for duplicate values. If there are dups, then clearly we have an issue already.

This give me a set of items with 1 dup as well as those with anagrams. The 0 result is the count of valid passphrases.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31