I ran across a book recently that is written to address some of the statistical issues the author has found in many scientific papers. It’s been released online for free, but if you like it and would like a more portable format, you can buy it for a reasonable price.

I haven’t dug into the book much, and I certainly don’t consider myself to be a statistician. That wasn’t one of my stronger subjects in school, and it’s a weakness I’d like to correct at some point. However I have had to use some of the statistical functions in past applications, and I wonder if I was using them wrong.

This week I decided to see how many of you are using more complex math in your systems. I’m hoping you understand how the functions work, but I wanted to ask what you’re using:

What statistical functions have you implemented in a production system?

I am thinking of functions beyond basic aggregates like SUM and AVG. Are you using standard deviations, linear regressions or some other complex functions? Have you made use of built-in functions in T-SQL, R, or some other language? Are you implementing custom functions in code or CLR Aggregates?

I think this is one of the areas where our profession will grow more and more across the next decade. As we deal with lots of data of varying types, and our organizations look to gain some strategic advantage through deeper insight into their information, we will have lots of chances to experiment and learn more about complex data analysis.

