Posterous theme by Cory Watilo

Powershell Deployment Script Update

Since last weeks post on automated DB and ETL deployment builds with Powershell, I’ve made a couple of small but useful changes to the Powershell script.

The distinct parts of the script are now encapsulated in parameterised functions. The end goal of this is to be able to turn the main script into a Powershell Module that could be installed and used when required. So now there are three functions, GetHeadFromSVN, SyncDB and DeployPackages.

DeployPackages has been changed to make it quicker and more streamlined. It will now only deploy or re-deploy packages that have been changed in the last hour, rather than every package in the repository.

The functions are as follows:

# Get latest DB scripts and SSIS packages from SVN
function GetHeadFromSVN
{
    param($svnpath, $svnurl, $destinationfolder)

    Set-Location $svnpath
    svn checkout $svnurl $destinationfolder
}

# Use Red Gate SQL Compare to deploy schema changes to each DB
function SyncDB
{
    param($sqlcomparelocation, $argfile)

    Set-Location $sqlcomparelocation

    Invoke-Expression ".\sqlcompare /ArgFile:$argfile"
}

# Deploy SSIS packages
function DeployPackages 
{
    param($filepath, $ssisfolder, $server, $dtutilpath)

        $dtutil = $dtutilpath
        Set-Location $filepath

       # Only deploy packages that have been changed in the last hour
        $files=get-childitem *.dtsx | where {$_.LastWriteTime -gt (Get-Date).AddHours(-1)}

        if ($files -ne $null) {
        foreach ($file in $files) {
            $basename=$file.BaseName
            $deletearguments = " /delete /SQL $ssisfolder/$basename /SourceServer $server"
            $deployarguments = " /file $file /copy SQL``;$ssisfolder/$basename /destserver $server"

            Set-Location $dtutil

            Invoke-Expression "dtutil $deletearguments"
            Invoke-Expression "dtutil $deployarguments"
         }  
    }
}

Here are some examples of how they would be called:

GetHeadFromSVN "C:\Program Files\SlikSvn\bin" http://<svnserver>/repos/bi/Database "C:\NightlyBuilds\BI\SVN\Database"
GetHeadFromSVN "C:\Program Files\SlikSvn\bin" http://<svnserver>/repos/bi/ETL/branches/Phase2 "C:\NightlyBuilds\BI\SVN\ETL"

SyncDB "C:\Program Files\Red Gate\SQL Compare 9" "C:\NightlyBuilds\BI\Configuration\StagingCompare.xml"

DeployPackages "C:\NightlyBuilds\BI\SVN\ETL\SourceToStage" "SourceToStage" "<target server>" "C:\Program Files\Microsoft SQL Server\100\DTS\Binn\"
DeployPackages "C:\NightlyBuilds\BI\SVN\ETL\StageToWarehouse" "StageToWarehouse" "<target server>" "C:\Program Files\Microsoft SQL Server\100\DTS\Binn\"

The Codeplex project page has been updated with the latest code.

Automating DB and SSIS builds from source control

This week I’ve been working on a new branch of a data warehouse project. To make deployment to development environments easier I decided I’d attempt to get some sort of continuous integration/automated builds to our dev server up and running. Doing this sort of thing has never been particularly straightforward for database projects, but with the fantastic SQL Compare command line tools and SQL Source Control from Red Gate, this task is made so much easier.

I wanted something relatively simple (as always), something that meant not having to install a framework. I did look at TeamCity, NAnt and MSBuild, but decided against using any of these, at least in the short term. My reasoning being that something light and fairly portable with as few dependencies as possible would be more useful at this stage.

The intention was to set up a nightly build job that would:

  1. Get the latest code from the development branch from SVN
  2. Deploy any schema changes to the development databases
  3. Delete and re-deploy the SSIS packages for the ETL

I used the excellent Continuous Integration article from Red Gate to get me going. I had started to implement this using a batch file (yuk!), but quickly moved onto using PowerShell, my new favourite toy on Windows! The examples in each section will show the PowerShell commands used in the script.

Setup and Configuration

Working locally initially, I created the following folder structure on the C drive:

  • NightlyBuilds
  • — BI (the project name)
  • —— Configuration (to hold any configuration files)
  • —— SVN (for the SVN checkout)
  • —— Output (for any output or report files)

Source Control

Our development team use dedicated local databases and push changes to the centralised Subversion (SVN) repository using Red Gate SQL Source Control. The first step in the build was to check out the latest SQL schema scripts and SSIS packages. After installing the SlikSVN client, I could use the following Powershell commands to perform the check out:

Set-Location "C:\Program Files\SlikSVN\bin"
svn checkout http:///repos/BI/Database "c:\NightlyBuilds\BI\SVN\Database"
svn checkout http:///repos/BI/ETL/branches/Phase2 "c:\NightlyBuilds\BI\SVN\ETL"

Deploy Schema Changes

Next up was to use the SQL Compare command line tools to compare each of the databases on the development environment with the schema scripts.

Set-Location "c:\Program Files\Red Gate\SQL Compare 9"
Invoke-Expression ".\sqlcompare /ArgFile:C:\NightlyBuilds\BI\Configuration\EDWCompare.xml"
Invoke-Expression ".\sqlcompare /ArgFile:C:\NightlyBuilds\BI\Configuration\StagingCompare.xml"

At the moment, I’m using XML argument files for each comparison (this may well change as this progresses), these have a format like:

C:\NightlyBuilds\BI\SVN\Database\EDW\branches\Phase2
server name
    EDW
default
C:\NightlyBuilds\BI\Output\EDWSchemaDiffReport.html
Interactive
C:\NightlyBuilds\BI\Output\EDWSchemaSyncScript.sql
user
role

Deploy SSIS Packages

With the databases being up to date, the next task was to deploy the SSIS packages. Having experienced a couple of issues in the past with packages not being overwritten correctly, I decided that, to be on the safe side, I’d delete an existing package before re-deploying it. The command line tool dtutil.exe lets us perform these tasks and more, but it can be incredibly frustrating to use at times, with it’s inconsistent syntax and problems with differences in SQL Server versions and 32 and 64 bit versions. Using the following function, the path to the SSIS packages is specified and each one is deleted on development, then redeployed.

function DeployPackages 
{
    param($filepath, $ssisfolder, $server)

    Set-Location $filepath

    $files=get-childitem *.dtsx

    foreach ($file in $files) {
        $basename=$file.BaseName
        $deletearguments = " /delete /SQL $ssisfolder/$basename /SourceServer $server"
        $deployarguments = " /file $file /copy SQL``;$ssisfolder/$basename /destserver $server"

        Set-Location $dtutil

        Invoke-Expression "dtutil $deletearguments"
        Invoke-Expression "dtutil $deployarguments"
    } 
}

DeployPackages "C:\NightlyBuilds\BI\SVN\ETL\SourceToStage" "EDW/SourceToStage" ""
DeployPackages "C:\NightlyBuilds\BI\SVN\ETL\StageToWarehouse" "EDW/StageToWarehouse" ""

Improvements

So that, in a nutshell, is the process I’m currently using. It works, which is nice, but I have quite a few improvements in mind:

  • Firstly, it seems to be very slow when running dtutil. I’m not sure if this is because I’m currently doing everything over a VPN, but I’ll be looking to improve the general performance. To be honest, it’s not a major issue, as it’s mainly designed for builds that happen overnight, but it would be nice if I could streamline it. I did try to use the /fdelete switch on dtutil to remove an entire folder, but that demands that the folder is empty.
  • Add more parameterisation so that it could be used for different projects on different environments
  • Use the SQL Data Compare command line tools to ensure any static/lookup data is up to date
  • Add an option to drop (if required) & create the databases and SSIS folders. This would allow the script to deploy to “virgin” machines and test the entire build.
  • Add job creation scripts
  • Build in SSRS report deployment. The project I’m working on does not use a great deal of SSRS reports (OBIEE is the end user tool), but given that report deployment can be a pain, it would be nice to include this too.
  • Add some reporting as to the success or failure of any component of the script.

Update: Looks like the VPN was indeed causing a slowdown. Running it connected locally to the network, 60+ SSIS packages were deployed in <3 minutes.

Code

I’ve started a project on Codeplex for this, really just as a place to keep the code. Any suggestions or comments are, as always, more than welcome.

Deleting from Excel in SSIS - a workaround

Here's a quick how-to post, a little work-around for an issue with writing to Excel from SSIS.

I've been asked a few times how to use SSIS to delete data from an Excel spreadsheet, then write data into that blank spreadsheet, while maintaining the column headers in the first row.  There is no particularly straightforward way of doing so in SSIS, as attempting a delete statement on the Excel sheet will either remove the data in all rows (including the header) or throw an error as the DELETE statement is not supported.  There is a method that I use, which I will step through here.

The basic idea is to do the following on each execution of the package:

1. Delete the XLS file that is being written to

2. Re-create the XLS using a CREATE TABLE statement

3. Load the data into the empty XLS

In order to do this, start by creating the Data Flow for step 3 and ensure that this works as expected (Figure 1 is a simple example).  This will create the XLS that will be written to.  Now we have the structure, we can use this to re-create the XLS using a SQL statement.  To do so, drop an Excel Source onto the flow (pointing to the destination file) and then an Excel Destination, and link the output from source to destination.  Open the Excel Destination and click the New button next to 'Name of the Excel sheet' - this will pop up a window with the CREATE TABLE statement that will be used in step 2.  Copy this and delete both the Excel Source and Destinations as they have served their purpose for this exercise.

On the Control Flow, add an Execute SQL Task, with the ConnectionType set to Excel, and the Connection set to the destination Excel connection manager.  Paste in the the CREATE TABLE script from the previous step into the SQL Statement property value.  This will create the XLS at run time.

Next, add a File System Task, with an Operation of Delete File and the Source Connection set to a new File Connection Manager, also pointing to the destination XLS.  

Finally, set up the Control Flow to execute the delete, then execute the CREATE TABLE, then perform the Data Flow, as shown in Figure 2.

One issue I found with this method: the Execute SQL Task connected to an Excel Connection Manager did not work correctly if the Excel version was Microsoft Excel 2007 (.xlsx extension).  Changing this to Excel 97-2003 (.xls) resolved this issue.

SSIS 2008 Deployment Manifest error

I had an issue this morning whereby I was trying to deploy an SSIS 2008 project using the Deployment Manifest, and was hitting the following error:

Could not save the package "<Package Path>" to SQL Server "<SQL Server>".

------------------------------
ADDITIONAL INFORMATION:

The package failed to load due to error 0xC0011008 "Error loading from XML. No further detailed error information can be specified for this problem because no Events object was passed where detailed error information can be stored.". This occurs when CPackage::LoadFromXML fails.

A quick Google search indicated that the issue was to do with a SQL Server 2005 installation that had been upgraded to 2008.  Right-clicking the Deployment Manifest file showed two options for the Installation Package Utility.  Even though both said 2005, one was actually pointing to the 2008 version.  Choosing this one corrected the problem.  The default location for the 2008 version is 

C:\Program Files\Microsoft SQL Server\100\DTS\Binn\dtsinstall.exe

You can also run this via the command line:

C:\Program Files\Microsoft SQL Server\100\DTS\Binn\dtsinstall.exe "C:\<project path>\ProjectName.SSISDeploymentManifest"

T-SQL Tuesday: Return a range of dates between 2 days

Results

 

Just after Adam Machanic announced the first of the T-SQL Tuesday blog events, I was asked a question about how to return a list of all the dates between two given days.  I came across some good forum posts about how to do so, many of which utilised numbers tables.  I wanted to avoid that (nothing against them, just wanted something that was stand-alone), and figured that a CTE would be the best way to go; in fact someone had posted such a method, a little of which I have used here (if I can find the post again I'll give them a mention in the comments!).

The following procedure takes 3 parameters: a StartDate, an EndDate and a bit flag to say if the range should be inclusive (that is, will it also return the StartDate and EndDate in the results), defaulted to 1.  A list of all the dates between the first two parameters will be returned.

The maxrecursion option has been set to 3660, giving a maximum of ten years dates to return.

create procedure dbo.GetDateRange
@StartDate datetime,
@EndDate datetime,
@Inclusive bit = 1
as

set nocount on;

declare @DayDiff smallint;
set @DayDiff = datediff(dd, @StartDate, @EndDate);
if @Inclusive = 0
     set @DayDiff = @DayDiff - 1;

with cteRange (DateRange)
as (
     select dateadd(dd, datediff(dd, 0, @EndDate) - @DayDiff, 0)
     union all
     select dateadd(dd, 1, DateRange)
     from cteRange
     where dateadd(dd, 1, DateRange) < (@EndDate + @Inclusive)
     )
select DateRange
from cteRange
option (maxrecursion 3660);

go

Run the procedure using the following code:
exec dbo.GetDateRange '2009-12-07', '2009-12-12'

and the results are as shown in the image at the top of the post.

Hope some of you find this useful.  This post is part of Adam Machanic's T-SQL Tuesday.

Identify the T-SQL being run by a SPID

This is one I used to use a lot, and had cause to use it this morning.  An ETL process to truncate a staging table was being blocked by another process on the server, and I needed to identify exactly what command the blocking process was attempting to execute.  I found the SPID by executing sp_who2, which showed me the SPID that was blocking the truncate command (in this case SPID 54).  I then executed the following command to find out exactly what command was causing the block to happen:

DBCC inputbuffer(54)

The result set returned shows the actual T-SQL that was last executed for that SPID, which was a very large insert from a remote PostgreSQL database. 

This is a quick and useful method for tracking down issues and investigating what is happening on your server.

SSIS and Tinyint datatype

I've had a couple of issues recently with using the tinyint datatype in SSIS.  In one case I have a Lookup Transform which joins to the incoming data flow path on a column defined as a tinyint in the database.  The column metadata in the data flow is DT_I4, a four-byte signed integer.  Dragging this across to the tinyint column in the Lookup gives me the following error:

The following columns cannot be mapped:
[DataFlowColumn, LookupColumn]
One or more columns do not have supported data types, or their data types do not match.

By process of elimination, I discovered the incoming column needs to be converted to a DT_UI1, a single-byte unsigned integer.  One to bear in mind for the future!

SQLBits IV

I travelled down to Manchester last weekend to go to the SQLBits conference, something that I'd heard a lot of good things about.  I'm very glad I went, as it was an extremely useful and informative event.  And free!  :)

With 28 sessions being given by 28 speakers, there was plenty of choice on what to go to...possibly too much as I missed out on a few that I would have really liked to have seen.  I decided to mainly stay on the Business Intelligence track, and went to the following:

  • Data Warehouse Data Modelling Vincent Rainardi
  • Instrumenting, Monitoring and Auditing of SSIS ETL Solutions Davide Mauri
  • SSIS Nuggets Live Jamie Thomson
  • Data Warehousing features in SQL 2008 James Rowland-Jones
  • Designing Effective Aggregations in Analysis Services 2008 Chris Webb
  • Data Mining and SSIS - makes sense Allan Mitchell
All of which were very well presented and pitched at just the right level.  Personally I found Davide Mauri's session the most interesting as I've been looking at instrumentation and auditing of SSIS packages a lot recently, and his innovative DTLoggedExec tool (which can be downloaded here) is something I'll be looking at implementing very soon.

I'll certainly be back next year, as I'm sure there will be a SQLBits V.  If you are a SQL Server professional then get yourself along, it is well worth it.

"Full" SQL Server in the cloud

Now this is a bit interesting!  On Tuesday, the SQL Data Services team announced that SDS would offer full relational database capabilities.  This has really perked my interest now, as I was never fully comfortable with the ACE (Authority, Container, Entity) data model.  Now that SDS is going to support Stored Procedures, Views, Indexes and even Triggers, I think this is a product to get excited about.

Of course, the announcement means I still have more questions than answers!  For starters; when will we get to start playing with the product?  How is it licenced?  Will it work out the box with SSMS?  How will it interact with SSIS and SSRS?

Read the announcement here: SDS Team Blog