Data Analysis Without a Server

Most of us that subscribe to this site are data professionals, and we work with large amounts of data for our organizations that is usually stored on a server class system, with TB sized, high performance storage systems. Whether on premises, in the cloud, or another data center, our employers have made an investment to provide high quality data services for our clients. This large investment is often a big decision, and setting up a new system to handle lots of data as an experiment with data analysis can often seem to take ages.

The R language has been popular for data analysis for years, though the data sets examined were often limited in size, usually because of workstation limitations. One of the reasons Microsoft added R services to the data platform was to move analysis closer to big stores of data and increase the ability of organizations to “operationalize” or deploy their analysis and models to a wider audience.

Deciding when to make that investment can be tricky, but the more that someone can prove some value from a smaller experiment, the more likely it is that an organization might decide to move forward. Recently, I ran across an interesting article, where the author had analyzed a billion row dataset on a commodity laptop. In this case, a Macbook Pro costing US$4000, but that’s a pittance compared to deciding to invest in HDFS storage, a Big Data Cluster, or even a large cloud experiment.

What caught my eye here is that the analysis tool used, OmniSciDB, was engineered to run on CPUs, not GPUs, and performed very well in analyzing the data. I haven’t found the time or set up the disk space to try and load the billion rows into a SQL Server columnstore index, but I’d be curious how that might perform on the same data. The queries run are fairly simple aggregations, and my guess is SQL Server would perform extremely well once the index was built. If someone else wants to try it and take notes, I’d love to read the experiment as an article on SQLServerCentral.

It has become more and more likely that before we embark on any large project in an enterprise that we perform some sort of prototyping and development on a small system. I think that’s true whether we’re building a web app or setting up a data science experiment that might drive our business forward. I always enjoy reading when someone has run tried a large scale analysis experiment on a workstation, not a server, and I hope we continue to see more people doing this and sharing their results in the future.

Steve Jones

Listen to the podcast at Libsyn, Stitcher or iTunes.

Posted in Editorial | Tagged | Comments Off on Data Analysis Without a Server

Don’t Forget Unique with FILESTREAM–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

While testing FILSTREAM with SQL Clone the other day, I kept getting an error while trying to create a table. I’d click Execute and see this:

2020-08-12 14_10_37-SQLQuery2.sql - ARISTOTLE_SQL2017.FSTest (ARISTOTLE_Steve (58))_ - Microsoft SQL

My mind kept focusing on the ROWGUIDCOL part, and not thinking unique. It’s been a few years since I worked with FILESTREAM as it’s not an Azure feature and things have been going that way for me.

In any case, after running this a few times, and then checking an old demo, I realized that I had forgotten UNIQUE as an attribute for the column. Once I added that, it worked.

The docs for CREATE TABLE shows this that unique is listed as a constraint property, but under the FILESTREAM section, is does say this about the ROWGUID column: “This column must not allow null values and must have either a UNIQUE or PRIMARY KEY single-column constraint.”

Don’t forget this, but if you do, read the error message,

SQLNewBlogger

This post took me about as long to write as it did to realize I was being silly and forgetting to read. Overall, this was about 10 minutes to compile, take the screenshots, and get the references.

When you write, look for places you’ve learned something, and then use those as ideas for blogs.

Posted in Blog | Tagged , , , | Comments Off on Don’t Forget Unique with FILESTREAM–#SQLNewBlogger

Daily Coping 24 Aug 2020

I started to add a daily coping tip to the SQLServerCentral newsletter and to the Community Circle, which is helping me deal with the issues in the world. I’m adding my responses for each day here.

Today’s tip is to cook your favorite food for someone who will appreciate it

My favorite food is pizza, and no one really appreciates it. Margaritas are second, but I’m not sure those count as cooking.

However, I do enjoy a number of foods, one of which is nachos. I don’t make them often, but I decided to end a week recently with a nice Friday night spread of nachos.

20200731_200513

I skipped making guac this week, but I did do half shrimp and half beef, since I have some vegetarians.

A bit hit, quite enjoyable, and a nice way to end the week.

Posted in Blog | Tagged , , | 2 Comments

Looking Back at Software Development Trends

In some ways, the world of software development hasn’t changed much. The same sorts of skills and techniques I saw people using on COBOL programs hitting DB/2 and C++ over Oracle 6 are used today in React and C# against Azure SQL database. On the other hand, it does seem that we are more mature in how we work together and the flexibility with which we design systems.

I saw the results from a survey from 2019 that Atlassian ran for software developers. This was a look at what modern trends might exist, though over a year later, perhaps the world is dramatically changed again. Let me look at each of the four trends they point out.

I’ve been hearing about microservices for years, but have found relatively few customers using them. I don’t hear a lot about them in the RDBMS space, and I think this is because the idea of separating out each entity (or small set) in a database and having data access only through a front end component doesn’t make sense. There’s power in using an RDBMS to enforce data integrity rules and allow aggregations. I’ve also seen some companies looking for miniservices, not micro ones.

Manual testing is still very prevalent for customers. While there are lots of unit tests, including some against a db, they don’t always extend through CI to more complex scenarios. Quite a few customers still have bottlenecks where humans look at the application in a larger sense. I think the high cost of tools that run more complex tests is a part of the problem. I’m not sure how we improve this, though I do hope to see more unit or functional tests for db code.

Feature flags are extremely complex, though I do see them more as a defensive measure, where features are released, but then turned off if there are issues. This prevents a rollback from the app and db standpoint. I also don’t see a lot of use of dark deploys for database features, perhaps because until the app is working, we aren’t sure the data model is correct. Feature flag cleanup is certainly an issue for some clients.

The last trend is one I rarely see implemented. Looking at customer outcomes, and not just immediate sales, is something that few people seem to do. Perhaps because developers are evaluated on the work they complete, not whether it’s in use. Managers are evaluated based on getting developers to do work, or on the sales that are produced, but it seems that few organizations try to measure the customer impact. We’ve started doing that at Redgate, and I’m interested to see how this evolves.

Keeping developers motivated, excited about their jobs, and productive with creative solutions is tough. The trends listed from the article seem to me more aspirational for most of the organizations with which I deal. Most clients I know see developers as interchangeable parts, similar to factory workers. I think if they invested a little more in the well-being of developers, both from their mental focus and the growth of their skills, they might find a lot more benefits accruing from their software.

Steve Jones

Listen to the podcast at Libsyn, Stitcher or iTunes.

Posted in Editorial | Tagged | Comments Off on Looking Back at Software Development Trends