You Need Offline Backups

If you hadn’t heard about it, VFEMail may be dead. At least, that’s what the founder was thinking in this article. A malicious hacking incident took place last week, and though they’re back up and running, who knows if customers will stick by them, or maybe sue them out of existence. I wouldn’t be surprised as a large number of their infrastructure servers were wiped out by reformatting servers. These included mail servers, backup servers, and SQL Servers.

That’s quite an attack, and whether this was directed at the company or some individuals, a large number of people might have lost their mailboxes and previously stored mail that was in IMAP storage. This is the type of issue that is most likely an annoyance for individuals, but it would be potentially catastrophic for businesses. Imagine your small business hosted with them and all your mailboxes were lost with customer communications and who knows what else. Perhaps you could recover data or keep the business going, but it might be an issue.

Could this happen with a cloud provider like Azure O365, Google Apps or AWS? Possibly, and while I’m sure they have backups, I’m not sure how reliable those might be for the average individual or small business. This makes me worry slightly as I depend on GMail and wouldn’t even try to backup to few 100GBs of mail I have. I’m not even sure how to do it, though I don’t really keep anything in there that’s really important. In any case, I’d suspect that connecting and somehow wiping out Gmail servers, along with backups, would be very difficult.

This does make me think about a few customers I know that use online storage for backups. They assume that they will always have either a primary server or the online backup server/share/bucket/container and can download data. The problem is that online systems that connect to the primary can be accessed. If an attacker were to access one, they potentially could access the second.

The world seems to be moving towards more online storage, or in the case of cloud vendors, a reliance on snapshots. That might be good enough for cloud vendors, but it’s certainly not for any on-premise system. It’s likely that an attacker, possibly with insider help, would wipe out backups first, then primary systems. I’d always want some sort of disconnected offline backup of data, especially database servers. I have seen Murphy’s Law strike two systems at once, so an air gap between copies of data just feels prudent.

Steve Jones

Posted in Editorial | Tagged , | Leave a comment

A Supercomputer in My Pocket

I work with data on a regular basis, and I really depend on my cell phone to help me with both work and life. I regularly make notes and get ideas for articles and editorials from things that happen when I’m on the go. Without a smartphone, I’d be juggling a notebook and pen, perhaps pulling over and making notes that I’d transcribe later. Since I’m in different cars and vehicles with different bags, I’m not sure how well I’d be able to keep track of notes on paper.

In the last decade, as I’ve purchased and upgraded mobile devices, it’s been amazing to me that I can use many services to help me. I can log into something like Instapaper to save an interesting article I might write about later, or capture a few thoughts in Evernote. Or even send myself an email with an idea for a Question of the Day. I’ve done all of those things while in various points when inspiration has struck.

In the last few years, I’ve started to use some audio notes for tracking when I’m driving. I might hear something on the radio, or from a podcast, and need to make a note. The power of dictation apps, which have improved tremendously since the early Dragon Speech Recognition days on 486 computers. While I still don’t completely trust audio recognition, what I’ve learned to appreciate is the ability to just record sound with an audio app that I can play back later. There are audio recording apps, though I often just use Evernote.

I use a lot of data transfer and storage on my mobile device. So much so that when I switched a couple years ago to the Google Fi, I was disappointed with only having 32GB of storage. It was amazing for a kid that grew up with 300kb floppy disks to think that 32GB wasn’t enough, but it wasn’t. I was constantly juggling space and deleting things. I recently upgraded to a phone with 192GB, and I’m hoping that will satisfy my need for pictures, video, and notes.

It’s amazing to see just how far computing has expanded, giving us incredible capabilities on the go that were science fiction a quarter century ago. Many appreciate the ability to review a document or spreadsheet, or even view a Power BI Report from any device. While I don’t often have the need, it is nice to know that I can catch SQL Monitor alerts and view data on my phone, or even restart a VM from a cloud shell from my car if I have the need.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.6MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | Leave a comment

Practical Web Scraping–Getting Started

As part of my learning goals for 2018, I wanted to work through various books. This is part of  my work with Python.

After going through a few first chapters, I decided to start my February learning with Practice Web Scraping for Data Science, which looks at data acquisition using Python to pull data from the web. I found the book interesting and also this would be a nice setup for two of the other books (Power BI and natural language processing).

Like many people, I find lots of data on the web, but I’m constantly struggling to get it into a database. I find myself going through gyrations at times to get data. Even with the cool features of Power BI, it hasn’t been as smooth as I’d like to get data, so I thought this would be a good book.

Part 1

The first few chapters of the book are about the basics of web scraping. We learn what this means, and get a little bit of a tutorial on who uses this technique, with some specific examples. We also get a basic python tutorial, which I skimmed. I know a bit about Python and this was a very basic, getting started.

The next part of the early book deals with the basics of what http transport looks like and how some networking works. This is interesting to me, though not sure it matters for scraping. We’ll see. There is some discussion on GET and the http standard, so perhaps that’s helpful. It is good to at least know what codes might come back or what headers or parameters you need to use.

The third chapter starts to get code working.  It opens with a discussion of HTML and how you can examine the structure of pages in your browser. This is a good reminder and basic tutorial of some of the web page developer tools that exist in your browser and that you might want to use when trying to build applications, especially those that scrape pages. There is also a basic CSS tutorial, which was good as I needed a little refresher. I rarely deal with CSS stuff, leaving that to others.

The last part of the chapter starts with the BeautifulSoup library, which is built to parse out text, and specifically, makes working with HTML easier. The examples are with a Wikipedia Game of Thrones page, but I added some examples, trying to translate this to a sports page. It worked OK, and I learned a few things.

The last part looks at Regular Expressions with BeautifulSoup and how you can search out elements and then start to copy data. It’s more complex and tedious, but then again, lots of programming is tedious. Once it’s working, it’s amazing.

Experimenting

I started to work with this in Azure Notebooks as a different way of tracking some work in Python. I’ll want to store things in a file at some point, but for now, this lets me start and stop learning and keep track of where I am without worrying about files and names.

Not sure if anyone can access it (it’s marked public), but my project and notebooks are here: https://notebooks.azure.com/way0utwest/projects/web-scraping-with-python

I ran some of the early scripts, which are just getting you used to working with Python and accessing web pages. I then copied some examples from my Calibre view of the book and executed them. I even tried to experiment a bit.

One note: copying the code seems to leave some invalid character in there for Azure Notebooks, so I ended up editing the beginning of every line to remove the offensive character.

This got me the basics of working with web scraping. Now to try and grab some data from another page and see what I’ve learned.

Posted in Blog | Tagged , , , , | Leave a comment

Learn about the State of Database DevOps Next Week

It’s a week to my webinar with Donovan Brown (b t, y g), leader of the League of Extraordinary Cloud DevOps Advocates, Principal DevOps Manager at Microsoft, and the guy that wants to Rub DevOps on Everything. He’s a passionate, intelligent, exciting guy whose enthusiasm is infectious. I’ve been honored to present with him at Build and I’m looking forward to our chat next week.

Join us next Thursday for our webinar on the 2019 State of Database DevOps report. You can register now and we’ll be live at 11am EST.

The 2019 State of Database DevOps report is available now and you can download it today. We’ll be analyzing some of the findings and discussing what this means for many organizations.

Register today and I’ll see you next week.

Posted in Blog | Tagged , , , | Leave a comment