Loading All CSV Files with PowerShell

I ran across Denis Gobo’s post about working with names from Social Security cards and wanted to play with the dataset. However, rather than use xp_cmdshell, which I might have, I decided to use PowerShell. I need to repeat this with Python, but for now, this worked.

There is a file you can download that has a lot of .txt files, each one with a CSV. You can read about this in more detail in Denis’ post. Essentially there were three fields, and the year of the file in the name. I downloaded and unzipped the files into a folder.

Here was my process:

  1. Loop through the files
  2. Extract the year from the file name
  3. bcp in the file into a staging table
  4. call a proc that takes the year as a parameter and moves all data

With that in mind, here’s how I set things up.

The SQL Server Side

I started by creating two tables.

CREATE TABLE Names
(   FirstName  VARCHAR(500),
     Gender     CHAR(1),
     NameCount  INT,
     YearInFile INT
);
CREATE TABLE Names2
(   FirstName VARCHAR(500),
     Gender    CHAR(1),
     NameCount INT
);

With these in place, I created a procedure:

CREATE PROCEDURE MoveNames @year INT
AS
INSERT Names SELECT FirstName, Gender, NameCount, @year FROM Names2;
TRUNCATE TABLE Names2;
GO

I tested these and they worked fine for my process. I want to load into Names2, then to get the year, I move the data to the other table, adding the year. There are probably better ways, but this worked fine for a relatively small load (few million rows).

PowerShell

With PoSh, I started with a simple loop. I tend to set variables first since this makes this easier to turn into a function.

$sourcefolder = “E:\Downloads\names”

$sourcefiles = Get-ChildItem $sourcefolder -Filter “*.txt”

foreach($file in $sourcefiles ){
   $yearinfile = $file.Name.Substring(3, 4)
   write-host “Loading File:” $file.Name ” year: ” $yearinfile

}

Running this gets me a list of all files along with the  years. That’s what I want, since I’m going to process each file and load it in with bcp. Thanks to Mike Fal for the outline of how I’ll do this.

Next, I set more variables at the top:

$InstanceName = “.\SQL2016”
$DatabaseName = “Sandbox”
$StagingTableName = “Names2”
$StagingProc = “MoveNames”

I’ll use these in this code. I call Invoke-Expression to run bcp and then Invoke-Sqlcmd to run my proc.

$cmd = “bcp ‘$DatabaseName.dbo.[$StagingTableName]’ in ‘$file’ -S’$InstanceName’ -T -c -t’,'”

Invoke-Expression $cmd
Invoke-Sqlcmd -ServerInstance $InstanceName -Database $DatabaseName -Query “$StagingProc $yearinfile”

From here, I put this in a file and ran it from my download location. The result:

2017-06-16 12_41_08-Windows PowerShell ISE

and from SQL Server:

2017-06-16 12_41_50-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

Now I can run some queries, and find out where I stand. #26, it turns out.

2017-06-16 12_43_35-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

2 Responses to Loading All CSV Files with PowerShell

  1. I really liked this post. There are a lot of data sets out there like this, and your handling of them was both instructive and had a nice punch line.

Comments are closed.