If one index helps speed up queries, than more indexes should help more, right? They do, but they also come at a price. Both in performance during data modifications (insert/update/delete), and in terms of space since each index must be stored somewhere.
I have never bothered to index every column of a table. Actually I’m not sure if I’ve indexed every column of any table. Perhaps that’s because I originally came from a dBase/Clipper/Foxpro environment and I had to manage every index manually. Adding too many indexes resulted in a lot of coding.
In SQL Server, each index is useful in two ways. When someone puts a filter in the WHERE clause, like this:
then an index on CustomerID will speed up this query. Instead of having to scan all rows of the table, the index on CustomerID can be searched, the correct row found. If this is a clustered index, then the data can be read. If it’s a non-clustered index, the server can retrieve the rowID and then go get the data from the clustered index without reading all the rows in the table.
In more recent versions of SQL Server, multiple indexes can be used. For example,
In AdventureWorks, there are indexes on both CustomerID and TerritoryID for this table. It is possible (with lots of data), that the optimizer might choose to scan the CustomerID index for all matching rows and then the TerritoryID index for matching rows, join those results together to get a set of rows for the overall query and read the clustered index for those specific rows. However that’s not something you can count on in SQL Server. Typically one index is used in many queries.
So why not index every column?
First, if the columns are large, like varchar(max), text, or varbinary(max) columns, then it doesn’t make sense to build large indexes unless you often query these fields. Even then, a full-text index is likely a better choice.
Second, each time you change data (insert/update/delete), then all indexes must be updated at the same time. This means that your write performance suffers, and that impacts the server read performance as well since resources are being used to perform those updates. The more indexes, the more work that has to be done in support of any DML statement.
Lastly, you typically find that most of the time you query a table based on 3-5 fields, and those are the best candidates for indexes. For transactional tables, this is typically the number of indexes that you want to put on each table. Reporting tables, or OLAP type tables, might have more indexes, but these are tables that typically receive mostly read activity, and rare write activity.
Which columns do you index? Pick those columns that often appear in your queries, and that are fairly selective. You can always query the missing index DMVs for help in choosing which indexes the optimizer things it might use.