Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.
I saw a question recently on Full Text Search. I knew the answer, but to test some code, I had to reset up an index, which took just a minute, but I decided to write about it. This post gets the basics of setting an index.
A full text index allows you to search a little more freely than standard T-SQL with a LIKE or wildcards. It’s useful for going through large amounts of text, mainly hundreds or thousands of words.
To get started, you need to know a few things. First, this system in modern SQL Server (2008+) is set up on all instances. You don’t enabled FTS like you would for In-Memory OLTP tables or FILESTREAM.
Next, you need a catalog for the FTS indexes, which is a logical container.
Next, a table with data.
Finally, you create the index. In this post, I’ll look at SSMS and the GUI. In another one, I’ll look at the T-SQL itself.
The quick way to get started is to right click your table. As noted, the database is already enabled for FTS. In the right click menu, there is a Full-Text index section, and under there there is a “define” choice, as shown here.
Click that and the wizard will start. The intro screen appears, and you can click next.
A unique index is required, and the next step let’s you pick the one you want to use. This allows the various FTS query items to return the key value used here in the index. In my case, I only have a PK, but if you have unique indexes, you can choose any one.
The column(s) you want to index need to be selected. I only have one here, but you can choose any, or multiple, character or image based columns. Image would be binary columns that might contain something like a Word document.
The statistical semantics are used to extract key phrases from documents. For basic FTS of character data, you wouldn’t use this, but if you are searching things like PDFs, Word, etc., you might enable this.
If you want to track changes as the data changes and update the index, choose auto or manual.
The next page is where you assign this to a catalog. This is where you can create one if necessary. You can also choose a different filegroup for storing the index, and set the sensitivity for accents and choose a stop list. If you don’t know these terms, something to look up (and blog about).
The next step is the population schedule. This is the place where you decide when the index is updated. You can allow the system to run when it detects changes if this isn’t a lot of data, or you can schedule this. Large indexes with lots of data can take time and consume resources, so some instances need this scheduled during low workload hours.
I tend to ignore this for demos.
You get a final summary of everything. You can check each item and go back if necessary to fix something. Or click finish.
For my small demo table, this completed quickly.
From here, I can run queries using CONTAINS() or other terms, as shown below.
If you want to follow along, here’s the table setup I used.
CREATE TABLE FTSTest (
myid INT NOT NULL IDENTITY(1,1) CONSTRAINT FTSTestPK PRIMARY KEY
, mydata VARCHAR(MAX)
INSERT dbo.FTSTest (mydata)
VALUES ('Now is the time for all good men to come to the aid of their country'),
('there are a number of men who are good in the world'),
('good for men that help others'),
('If there are men who others might consider good, we should support them'),
('Good is a concept that is sometimes hard for men to comprehend'),
('Good is a concept that is sometimes hard for anyone to comprehend')
That’s a quick setup. I’ll look at queries and the T-SQL setup in another post.
This post was the first of a few that I made after solving a problem for someone. I took my 10 minutes of code writing and added about 10 minutes each for a few posts, including this one.
A good way to break down a problem into a few posts and get a few weeks worth of content that shows your knowledge and learning.