When to Use a Database

One of the trends of the last ten years has been for many developers to try and avoid using a relational database where possible. Some look to NoSQL data stores, and others even consider flat file type stores of JSON or other formats that allow developers to work with speed and agility. Quite often it seems that applications grow to require some sort of relational store, often as an additional data store.

I ran across an article from a data science and analysis developer that is often performing work in R or Python on datasets. At some point, the post notes that when your dataset(s) become larger than memory, you might want to consider using a local database of some sort.

Actually, the first question the author asked was “when is your data too big?” Their answer: when operations take a long time, which was 20 seconds for the author. I tend to agree as I am looking for Notepad-like startup performance for apps, and query results in low 10s of seconds.

Most people that perform some sort of data analysis understand tables. Whether this is in R, Python, or even Excel, the table structure for data is familiar and easy to work with. While some analysts might not be overly concerned about normalization, that isn’t always a problem for situations where data is loaded into systems and rarely (if ever) updated. In these cases, just having a database of some sort, could speed up your work.

I think you ought to use a database early, if for no other reason than this is good practice with loading and storing data in a form that is persistent, scalable, and often can perform better across time with disparate queries and data manipulation. While quick experimentation is rapid with in-memory tools, I think a database is better suited to queries across time.

I know I’m biased, but if you find data scientists and other analysts struggling with data sets, offer them a database. They can easily share data, you can protect it with backups, and you might find that you both learn a few things from working together.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

This entry was posted in Editorial and tagged data science. Bookmark the permalink.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

When to Use a Database

About way0utwest

Search this blog

VS Live San Diego

18 Year MVP Awardee

Tags

Search this blog

Steve’s Tweets

Older Posts

Meta

Recent Posts

Archives

Copyright Steve Jones 2018

Copyright 2016

Meta

When to Use a Database

Share this:

Related

About way0utwest

Search this blog

VS Live San Diego

18 Year MVP Awardee

Tags

Search this blog

Steve’s Tweets

Older Posts

Meta

Recent Posts

Archives