While trying to work with Unicode data, I found some issues with collation. This post showcases what I’ve seen, with probably not enough answers. The collation/UTF stuff is still slightly confusing to me.
Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.
Noticing Problems
I was doing some testing with Unicode data and noticed this sentence in the docs for UNISTR() (image below): “The database collation must be a UTF-8 collation if the input is of char or varchar data types.”
I started experimenting with SQL 2022 with a default, US database. I ran this code:
SELECT N'Denver ' + NCHAR(0x1F601), DATABASEPROPERTYEX('sandbox', 'Collation')
That gave me unexpected results. The inputs aren’t char or varchar. They are NCHAR.
Strange. I’d have expected this to work. Let’s try the COLLATE clause. That should help.
It doesn’t.
One Solution
I decided to create a new database to test things. First, I ran this code to create a database using a UTF-8 collation:
CREATE DATABASE UnicodeTest COLLATE Latin1_General_100_CI_AS_SC_UTF8
Next, I tried my test. Same code as above, different database.
This works. I see my Unicode characters.
Why, I’m not sure. I would think that my requesting a collation for a query would work, but I see this in the docs, which notes this is for ORDER BY.
In the Write International T-SQL Statements doc, there is this:
I’m not sure what UCS-2 means when I’m querying in memory only, but apparently this matters.
An Explanation
The real answer is found in the NCHAR() docs. In here, the arguments section notes this:
The key is the Unicode value. NCHAR() handles up to 0xFFFF (4 Fs). My value is 0x1F40E (5 characters), so it’s out of range for the values that are handled with a non SC collation.
If I return to my Sandbox, non SC collation database, I can get Unicode characters, as long as they are below the FFFF threshhold.
A fun little experiment, where I learned something.
SQL New Blogger
This is a great example of my finding a problem, digging in, and solving it. Around some other work, this probably took me about 30 minutes to figure out with some reading and experimenting. Then about 15 minutes to write this post.
This is something you could easily do and showcase your knowledge as someone looking to learn and grow.


