Database Collation Matters for Unicode: #SQLNewBlogger

While trying to work with Unicode data, I found some issues with collation. This post showcases what I’ve seen, with probably not enough answers. The collation/UTF stuff is still slightly confusing to me.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Noticing Problems

I was doing some testing with Unicode data and noticed this sentence in the docs for UNISTR() (image below): “The database collation must be a UTF-8 collation if the input is of char or varchar data types.

2025-12_0088

I started experimenting with SQL 2022 with a default, US database. I ran this code:

SELECT N'Denver ' + NCHAR(0x1F601), DATABASEPROPERTYEX('sandbox', 'Collation')

That gave me unexpected results. The inputs aren’t char or varchar. They are NCHAR.

2025-12_0089

Strange. I’d have expected this to work. Let’s try the COLLATE clause. That should help.

It doesn’t.

2025-12_0091

One Solution

I decided to create a new database to test things. First, I ran this code to create a database using a UTF-8 collation:

CREATE DATABASE UnicodeTest COLLATE Latin1_General_100_CI_AS_SC_UTF8

Next, I tried my test. Same code as above, different database.

2025-12_0093

This works. I see my Unicode characters.

Why, I’m not sure. I would think that my requesting a collation for a query would work, but I see this in the docs, which notes this is for ORDER BY.

2025-12_0094

In the Write International T-SQL Statements doc, there is this:

2025-12_0095

I’m not sure what UCS-2 means when I’m querying in memory only, but apparently this matters.

An Explanation

The real answer is found in the NCHAR() docs. In here, the arguments section notes this:

2025-12_0096

The key is the Unicode value. NCHAR() handles up to 0xFFFF (4 Fs). My value is 0x1F40E (5 characters), so it’s out of range for the values that are handled with a non SC collation.

If I return to my Sandbox, non SC collation database, I can get Unicode characters, as long as they are below the FFFF threshhold.

2025-12_0097

A fun little experiment, where I learned something.

SQL New Blogger

This is a great example of my finding a problem, digging in, and solving it. Around some other work, this probably took me about 30 minutes to figure out with some reading and experimenting. Then about 15 minutes to write this post.

This is something you could easily do and showcase your knowledge as someone looking to learn and grow.

Unknown's avatar

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.