Archive

Archive for the ‘Utilities’ Category

Running Many Batch Statements in Parallel

2012-01-08 1 comment

Location: Somewhere over the Atlantic

imageWhen designing highly scalable architectures for modern machines, you will often need to do some form of manual parallelism control. Managing this is not always easy, but in this blog I will give you one piece of my toolbox to help you.

Let us walk through an example together, a tiny case study. This is a problem which many of you will be familiar with.

Let us say you have 16 files that you want to load into the same table in your database in an automated manner. The naïve approach will do something like this:

BULK INSERT MyTarget FROM ‘C:\temp\MyFile1′ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

BULK INSERT MyTarget FROM ‘C:\temp\MyFile2′ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

… etc…

BULK INSERT MyTarget FROM ‘C:\temp\MyFile16′ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

Now here is the problem with this approach: it executes one statement at a time. Sequential execution is BAD, you need to stop thinking about the world like that if you want to scale on a modern architecture.

Lets assume we have enough hardware resources (in this case, it would take a blade server and a decent I/O system). What we really want is to run every one of these statements in parallel. Unfortunately, SQL server does not have a command to start up new connection from inside T-SQL… what to do?

Getting to the Command Line

Because you cannot execute more than one command on a single connection at a time, we will need multiple connections to SQL Server and this mean we have to go back to the command line. Let us start by creating a little batch file Worker.Cmd with this content:

REM Worker.Cmd File

CALL SQLCMD –S.\MyServer –q”BULK INSERT MyTarget FROM ‘C:\Temp\MyFile%1’ …EXIT

This allows us to invoke a bulk load for the first file by executing: Worker.Cmd 1

Unfortunately, we still cannot start multiple connections without manually firing up a lot of command prompts. The coders in the audience may at this point reach for their favorites programming language to write a little utility that can spawn multiple copies of an executable.

However, there is a problem with such a home made executable: you cannot generally rely on a server having the necessary runtime libraries. Typical comments might be:

 “No, we don’t have .NET 4.0 here, this is not yet certified by our infrastructure department. Could you recompile it for 1.1 please?”.

“Power Shell is much too fancy for us, what is wrong with running this on Windows 2000?”

Perhaps this customer is just skeptical about letting you run your executable on a server. This may sound silly, but I have seen this happen too many times to make assumptions.

Start to the Rescue

There is a very nice little utility for the good old command prompt that allows you to fire up new processes: START.EXE. This comes with all versions of Windows and it takes any command line executable as input, fires it up in a new thread and returns control back to the caller.

Using start.exe, we can write batch script that fire up multiple copies of the same executable. It looks like this:

REM SpawnMany.cmd

REM Author: Thomas Kejser
REM Purpose: Spawns many copies of the same executable. Useful for running many things in parallel

@ECHO OFF
ECHO Spawning %2 copies of %1

FOR /L %%i IN (1, 1, %2) DO (
    ECHO Spawning thread %%i
    START "Worker%%i" /Min %1 %%i %2
)

Each new process is started in a minimized window and we pass the thread number and the total number of threads to it. Using this little batch script, we can now do this:

  SpawnMany.exe Worker.Exe 16

This starts 16 workers, each with their own thread number assigned. Very useful for running stuff in parallel in a quick and dirty way. For example, I use this to run the TPC-H data generator dbgen.exe highly parallelized.

Notice that I added the EXIT command at the end of the worker.cmd batch. This makes sure that the window closes itself when done executing.

Summary

In this blog, I have shown you how to write a little batch script to fire up multiple threads, from the command line. each doing their own work in parallel. The script is “zero dependency” which makes it ideal for server use and for hacking together quick and dirty parallelism for test scenarios.

I mentioned that SQL server does not have a way to start up new connections from T-SQL. This is not strictly true. Sorry for leading you astray, but I wanted you to see how to do this from the command line first (and go through the pain Devil ). There is a way to hack SQL Server and implement a stored procedure I like to call sp_executesql_async. This will be the subject of a future blog, but since I am heading into a lab for a few weeks, you just have to wait for it.

Implementing MurmurHash and CRC for SQLCLR

2011-12-07 Leave a comment

As we saw in my previous post, the build in hash functions of SQL Server were either expensive with good distribution, or cheap, but with poor distribution. As a breath of fresh air, let us look at a useful magic quadrant:

 

image

We see that all of the hash functions exposed by HASHBYTES fall into low speed quadrants, caution is advised here. We also see that while CHECKSUM and BINARY_CHECKSUM are challengers, they do not have the spread to fully compete. We are left with modulo as the best possible hash function, but we believe that this algorithm may have trouble executing. The Modulo hash function has some problems (sorry, had to say “problem” instead of “challenge”, I couldn’t bear speaking management bollocks anymore):

  • It only works on integer types
  • If there is structure in the data (for example, if all values are equal) it will not spread the values equally
    The question we are faced with is:
    “Could we create a high speed, good spread, hash function that belongs in the leader quadrant and which does not have the limitation of the module functions”

It was this question I sought to answer when I started hacking away at SQLCLR functions. As you read the following, please don’t laugh too hard at me (a small snigger will suffice) – it has been a long time since I hacked away at code.

MurmurHash2 and Davy’s Bit Magic

Google has published a very interesting hash function, MurmurHash, claiming to have the properties I am looking for: It is both fast and has a good spread. The hash algorithm comes in different flavours: MurmurHash2 and MurmurHash3. Each flavour also exists in architecture and word size optimized versions. I decided to implement the x86/x64, 32-bit integer version in C#.

Fortunately, someone already beat me to it. Davy Landman has an implementation of MurmurHash2 – which he has kindly shared under GPL. I took his implementation and wrapped it in a SQLCLR user defined function.

While trying to understand Davy’s code and hacking it to work with SQLCLR, I had a few “aha moments” that I would like to share.

First of all, Davy uses this cute trick to turn 4 bytes from a BYTE[] array into a 32-bit INT:

UInt32 k = (UInt32)(data[currentIndex++] 
                  | data[currentIndex++] << 8
                  | data[currentIndex++] << 16
                  | data[currentIndex++] << 24
           
);

 

I thought that was a rather neat trick. To practice my illustration skills, here is what happens:

image

Please note that left shift (<<) and or (|) operators take INT32, not UINT32 as input. Hence the need to cast the final value to UINT32.

Speaking of integers, in my modification of Davy’s source code you will find:

unchecked
{
    return (SqlInt32)(Int32)h;
}

I need to get form the UINT which is returned by the reference implementation of MurmurHash to the signed integer (SqlInt32) that SQL Server understands. Doing the unchecked cast here is faster than Convert.ToInt32.

MurmurHash3

From the C++ source code of MurmurHash3, and using Davy’s bit trick, it took me only a few hours to get myself a nice implementation of MurmurHash3. I compared this with a C++ implementation done by my colleague Christian Martinez (based directly on the Google source), and we agreed on outputs that exercise all the branches in the code. So I am reasonably confident that my implementation is correct.

Except for the fact that it turns out to be important to use SqlBinary instead of SqlBytes (blogged here), there is not much more to say. Feel free to use the MurmurHash3 for your own implementations (my implementation is GPL)

The source code for the MurmurHash functions is too long to paste in this blog, so I created a new page on my site which you can find by following this link: C# Source code for MurmurHash in SQLCLR.

Before I show you the spread and speed results, let me just talk a little about two very old, but very commonly used, hash functions.

CRC16 and CRC32

The Cyclic Redundancy Check (CRC) family of hashes are, compared to MurmurHash, very simple to implement. They walk through each byte in the input and divide it with a specially selected polynomial. For implementation purposes, the polynomial is simply a constant in the code and the division is done with XOR and shifting (making this a very efficient operation). The remainder of the byte division is fed into the division on the next byte (giving the hash algorithm its name) and so on, until the end of the input .

WikiPedia contains a great reference implementation that I used for my tests. For readability, the source code is again available on a separate page of this blog: CRC16 and CRC32 in C#.

In the naïve approach to CRC, the polynomial division is implemented using an inner loop through all the bits of each byte:

rem = rem ^ data[i];

for (int j = 0; j < 8; j++)
{
  if ((rem & 0×00000001) == 0×00000001)
  {   // if rightmost (least significant) bit is set
     rem = (rem >> 1) ^ Polynomial32; /* Polynomial */
  }
  else
  {
    rem = (rem >> 1);
  }
}

An interesting optimization, shaving off a lot of cycles, is to create a table that allows you to look up the result of this loop for all 256 combinations of 8 bits. My source code contains such  tables, both for CRC32 and CRC16 implementations. Using these tables, I can replace the above code fragment with:

rem = (rem >> 8 ) ^ CRCTable32[(rem & 0xff) ^ data[i]];

…A major optimization

Speed Results

At this point, you may wonder if I did all of this to give you a chance to laugh at my rusty coding skills. What exactly was the point of implementing a new hash function in the first place?

Without further ado, let us look at the results compared to native SQL Server functions:

image

Isn’t that beautiful? If we don’t need the cryptographic properties of the HASHBYTES functions, we can beat SQL Servers hashes with our own SQLCLR implementation. Please note that I took out MD2 from the above to avoid skewing the results (I have previously shown that it is simply too inefficient).

Now, the bad news: the yellow line above. That represent a “hash function” that returns zero, like this:


public static SqlInt32 NoHash(SqlBinary data)
{      
  return (SqlInt32)0;
}

 

Notice that even such a simple user defined function that does nothing, consumes almost all the same number of CPU cycles as a full hash calculation. This is the overhead of the CLR and throwing data back and forth between managed and unmanaged code. So much for my optimizations in CRC.

Spread Results

We now know that there is at least hope of implementing a faster hash function than HASHBYTES. But how well do these functions spread the input. Running the chi-squared test from my previous blog entry we get These results:

image

Now THAT is pretty nice isn’t it? We see that good old CRC makes for an excellent hash function for our purpose. MurmurHash does very well too, comparable with SHA, but at under half the CPU cost.

Summary

In this blog, I have shown you how to implement hash functions that are both faster and have better spread than the built in hash functions exposed in SQL Server HASHBYTES. We have also seen that the overhead of SQLCLR is significant for this case, and hence, it would be preferable if non cryptographic hash function were exposed natively in SQL Server.

If you care about such new hash functions for SQL Server, I would suggest you file a Connect item to ask for them and state your purpose for wanting them. “It would be cool” or “Thomas says so” is not enough, provide a business justification. But at least, now you know what to ask for…

I owe Christian Martinez thanks for this blog for directing me to the MurmurHash functions, and for helping me validate the correctness of my C# implementation against his C++ version.

PS: As I was exploring this little implementation, I noticed that one of my heroes, Donald Knuth, has extended The Art of Computer Programming with a new volume: Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams. That sounds like an interesting read and it is now on my birthday wish list.

Guest Posts Coming up

2011-09-20 2 comments

Lately, I have been discussing data modeling with a lot of people and there is simply so much ground to cover. Lots of exciting developments – it seems I was on to something when I started a blog about this topic.

To make faster progress, I have teamed up with Marcel Franke and  Cardory Van Rij.  I am very exciting about this development, We will be doing some blogging together in the near future, both here on blog.kejser.org and on their individual blogs.

Based on feedback on my previous post, I will be writing about Type2 dimensions and history tracking for the next post in the modeling series. It will be a little delayed, since I am traveling a lot the next month.

Stay tuned…

Categories: Modeling, Utilities Tags: ,

A Mobile Phone – How Hard can it be?

2011-08-11 1 comment

I have recently switched to iPad as my primary device for media viewing, music, book reading, email and to some extend blogging (though I still find Live Writer superior and even run VMware Fusion on my home Mac for that purpose alone). Because I no longer need a big screen on my phone, it is therefore time to go back to the old days.

Remember when phones would last two weeks on a single charge? Back when, thanks to the T9 dictionary, you could type almost as fast with one hand as a ten finger typist on a keyboard, and certainly faster than on iPad. Yes, tactile feedback rules – the brain latency of vision is too high. Remember when we were convinced that the next evolutionary step for mankind would be smaller thumbs, to type even faster? We didn’t have twitter back then, so we were forced to type real sentences that reflected full thought patterns – these things took many characters and long words.

So, I went searching for a phone to replace my legacy devices (I own both HTC Mozart and iPhone 3G). Feeling like a dinosaur stampeding into a crowd of intelligent design proponents – I walked bravely into the Nokia store in Heathrow. I clearly remember this wonderful brand for its legendary 6310i, I thought they should get the first stab at taking a sizable piece of my budget.

The friendly girl lit up immediately when I asked for "a phone that can make calls and send SMS, nothing else". It turns out that I am not the first customer to make that request. A sigh of relief for me. As much as I like to be bleeding edge in my job, I prefer technology in my personal space to be tested and sound – hassle free if you like. It turns out that Nokia is catering to the dinosaur market – with the C3-01 model. It can be yours for the meager price of 120 GBP – unlocked and subscription free.

Eagerly, I set out to try before I buy. Demo models were available in black and silver. The silver looks bling bling, the black one has a cool, carbon fiber feel to it. The phone still has a T9 dictionary and it only took my nervous system a few seconds to reacquaint itself with this phenomenal design. It has a fast address book too, navigated with the same T9 functionality. It syncs with Google contacts – which you will not get on a the 6310i. Sure, it does Facebook and Twitter too, but I can forgive it for those bugs. The screen is nice but a bit big. It is laid out nicely and easy to read, but why it is in colour and so large defeats me. Perhaps it is related to the 5M pixel camera, but who cares about that? As may be clear from my gravatar, I look just as horrible on pictures as in reality. It charges from micro-USB, no Apple vendor lock-in there – big plus. Bluetooth? Unfortunately yes, but it also has a jack for a wired headset to workaround THAT bug. The girl claims it can do 6 hours of non-stop talking and will last 3-4 days on a charge if receiving the occasional call. Not impressive, but i guess some battery has to fuel that damn colour display. You can install apps too, but I hope you don’t have to. As all true love affairs, you learn to love the small flaws in your partner, even a colour display and low stamina when fully charged. At this point, I was reaching for my credit card.

So why am I still dragging my makeshift, 20 USD, parallel imported, no-name, hacker unlocked device around? Why do I have 120 GBP too much on my bank account that are not contributing to saving our economy? Because I decided to try a few more things while waiting for my plane – and thank goodness it was late. I scouted for the settings on the phone; just to check if Nokia still does those cool profiles that allow you to customize how the phone behaves in a meeting room. It was at this point I discovered the deal breaker! The phone has a touch screen… Why? One of the points of an old fashioned phone is that you can operate it with one hand, and stuff your face with hamburgers using the other. But it doesn’t end there, the scrolling functionality of the menus…well… Let me put it nicely first: it leaves something to be desired. One might ask, why even scroll in the first place, mainframes did fine without it, and i dare say they do quite a lot more than a mobile phone. At this point I was acutely aware of my dinosaur opinions, so I asked the nice girl: "sorry, am I missing something here or did they f*** up the scrolling?". Our mutual smirk made it clear that this was another sentence she had heard before.

And this brings me to the point reflected in the title of this blog: when it comes to simple, personal devices with well understood requirements – there is NO EXCUSE to f*** up anything. When the poor scrolling of the Nokia touch screen makes me annoyed, when I wait until the next ice age for a Samsung Galaxy Pad to move to the next app pane – I have ask myself: if the engineers are this sloppy, or under this much time pressure, what else did they miss that is going to come back to haunt me? This is why Apple is worth more than the US, and why Linux is growing like crazy at Amazon Cloud Services – even though you have to manage it from a black and white SSH session. It is not just about passing the user and integration test. It is about engineering for excellence up front: do less, but do it well. It is about not accepting to ship things that are obviously wrong! Engineers are generally good people that want to do the right things. If only companies would let them work, and if us buyers would stop asking for all those checkbox features that we will never use: like big colour touch screens and the ability to run apps :-)

I will go on eBay to hunt for a repurposed Nokia 6310I now, so I can stop being angry and get back to blogging about databases. Thanks for listening.

PS: incidentally, you can now track updates to my blog on twitter. If you are too impatient to wait for the RSS feed or too busy waiting for your Android screen to scroll, my username is thomaskejser. But please, please use email or my blog comments to communicate with me – I don’t have the patience to acquire a smartphone.

Categories: Musing, Travel, Utilities Tags: ,

Latch and Spinlock Papers Published on Microsoft

2011-07-01 Leave a comment

I am happy to announce that my team mates, Ewan Fairweather and Mike Ruthruff have published two excellent whitepapers on latch and spinlock diagnosis. You can find them here:

Pay special attention to the spinlock scripts, you will find an interesting trace flag in there that sheds more light on my intentionally vague dodging of the call stack subject in my blog entry here. I am sorry that I did not provide more details in the blog, but I did not want to give the plot away Smile

I am very proud to have Ewan take over the OLTP tuning on the SQLCAT  team here in EMEA. Buy this guy a drink next time you meet him at a conference and have a chat with him about scalability.

The SSIS Balanced Data Distributor is released

2011-05-26 7 comments

Those of you who have seen my data loading presentation at SQLBits and PASS may remember that I recommend explicitly multi threading data loads to achieve higher load speeds.

This job has now become a lot easier from SSIS with the release of the Balanced Data Distributor (BDD). it is available here:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=ea0a1544-5ee4-4ad4-9d76-296d0632f162&displaylang=en

For max scale, you will still want to do some manual form of threading, but the BDD should help you achieve better scale for many common scenarios.

Categories: SQL Server, Utilities Tags:

Utility functions: fn_convert_to_base and fn_nums

2011-04-26 2 comments

I will often use code to illustrate my points in this blog. Because data generation is a big part of these examples, I will take the chance to introduce a few functions that I find useful for that. I will use these functions in my examples, so refer to the Utilities category on this blog to find the source. Most of these functions are adaption from other users of SQL Server and I will do my utmost to give credit where credit is due. If you feel you are the owner of the original idea – please send me an email so I can give you proper credit.

fn_convert_to_base is very useful for creating hex or binary representations of numbers. As we shall see, using the hex representation of a key can be good for illustrating some points. The version I use is based on the code by Leo Vindosola:

CREATE FUNCTION dbo.fn_convert_to_base
(
    @value AS BIGINT,
    @base AS INT = 16
) RETURNS VARCHAR(MAX) AS BEGIN  

    -- some variables  
    DECLARE @characters CHAR(36),
            @result VARCHAR(MAX);  

    -- the encoding string and the default result  
    SELECT @characters = '0123456789abcdefghijklmnopqrstuvwxyz',
           @result = '';  

    -- make sure it's something we can encode.  you can't have  
    -- base 1, but if we extended the length of our @character  
    -- string, we could have greater than base 36  
    IF @value < 0 OR @base < 2 OR @base > 36 RETURN NULL;  

    -- until the value is completely converted, get the modulus  
    -- of the value and prepend it to the result string.  then  
    -- divide the value by the base and truncate the remainder  
    WHILE @value > 0
        SELECT @result = SUBSTRING(@characters, @value % @base + 1, 1) + @result,
               @value = @value / @base;  

    -- return our results  
    RETURN @result;
END

Probably my all time favourite function is fn_nums that allows you to quickly generate a virtual table of integers. I use Itzik Ben Gan’s version:

CREATE FUNCTION [dbo].[fn_nums](@n AS BIGINT) RETURNS TABLE
AS

    RETURN
    WITH  L0 AS(
        SELECT 1 AS c
        UNION ALL
        SELECT 1)
        ,  L1 AS(
            SELECT 1 AS c FROM L0 AS A, L0 AS B)
        ,  L2 AS (SELECT 1 AS c FROM L1 AS A, L1 AS B)
        ,  L3 AS (SELECT 1 AS c FROM L2 AS A, L2 AS B)
        ,  L4 AS (SELECT 1 AS c FROM L3 AS A, L3 AS B)
        ,  L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B)
        ,  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5)
        SELECT n FROM Nums   WHERE n <= @n;

For pasting source code into the blog, I use the Code Snippet Paster from Omar.

I am not too happy about the syntax formatting though, doesn’t recognize all keywords and comments. Hope someone reading can recommend a better plug-in for live writer.

Categories: Utilities Tags:

Automating build of Integration Services projects

2006-11-29 Leave a comment

After our experiences with the build process in Analysis Services – expectations were low when we turned our gaze towards Integrations Services (SSIS).

Our overall experience with SSIS has generally been positive – it’s an extremely powerful tool with some interesting performance characteristics. There are some annoying flaws – especially in the Visual Studio integration. When I work with SSIS projects – three or four daily crashes of Visual Studio is not uncommon. Never fear – the power provided by the tool more than makes up for this when it comes to choosing the ETL platform. And buggyness is what service packs are for … isn’t it? :)

The first thing you notice when you choose "build" in Visual Studio on an SSIS projects is that Visual Studio doesn’t do anything – except of course copy files to the bin directory. Why? Well, guess you could say that the *.dtsx files contain all the information need to run the packages – there is "nothing to be done".

But, when you try to automate a build there actually is something to be done. First of all: you have to mark which files contain the build output. That is, the files you wish to copy to your deployment target. With this in mind – we tried using msbuild on the project – hoping that it would at least mark the *.dtsx files as build output (it could just take a look at the bin directory for us) so Team Foundation can copy them to the release directory. No surprises – integration is non existent and there are no free meals here.

Guess you could say that the command line to automate build of SSIS is simple: just copy the bin\*.dtsx files to your build output directory.

We could have stopped here and be done with it – but we like build logs (if you can use the about the word: "like" in this context). Those of you who have deployed a big BI-installation by hand may have noticed the excellent command line tool; dtexec.exe used to run SSIS packagesThis tool has some very useful parameters that come in handy for the structured developer.

When you execute an SSIS package - two things happen:

  • The package is validated. During this phase, connection managers and logs providers are tested and metadata for all tasks and flow components is checked for inconsistencies
  • The package is executed. There seems to be some sort of compilation going on – part of this is probably done in the validate phase. After this compilation – the package is executed. The compilation part is an educated guess – I cannot believe SSIS can be as fast as it it without first compiling the package to machine code.

Here’s the trick: The first phase produces both warnings and errors if you have made a mistakes in your package design. This is a very useful output and can be considered a build log of the package. Fortunately – its possible to invoke the first phase without executing the entire package. You simply do this:

dtexec /F MyPackage.dtsx /Va /Rep EW

/Va makes the package validate but not execute

/Rep EW tells SSIS that you only want to see warnings and errors on stdout. If you don’t use the /Rep switch you will clutter your "build log" with useless informational messages.

Summary: to build an integration services project you do two things:

  • Execute dtexec with proper command line switches on all your *.dtsx files. Save stdout from these command lines to your build log.
    • If the build log contains any errors – the sadistic among you may want to email notificaty to the responsible developer. It gives him a chance to get up in the middle of the night and fix the bug :-)
  • Copy all bin/*.dtsx files to you build output directory

Automating build of Analysis Services Projects

2006-11-28 2 comments

As you may be aware, an Analysis Services project consists of the following files:

  • A project file (*.dwProj)
  • One or more  Data sources (*.ds)
  • One or more Data source views (*.dsv)
  • A database file (*.database)
  • One or more cube files (*.cube) with their partition (*.partitions)
  • One or more dimension files (*.dim)

You can right click the project in Visual Studio and choose "build". What happens now seems to be a lot of called to DLL files executed by the integration between Visual Studio and the Analysis Services. The output of this build is the bin folder in your project directory. This folder contains a database file (*.asdatabase). The .asdatabase file is the one you want – you will see why later.

Unlike other projects – which you can build using their proper compiler (Example: CSC.EXE, VBC.EXE) – Analysis Services does not seem to have a compiler executable.

You could be tempted to run msbuild. Unlike most temptation this will result in disappointment – msbuild does not understand Analysis Services files.

The only way I have found to automate the building of Analysis Services projects is to invoke Visual Studio from the command line. Say you have a project called: MyAsProject.dwProj. To build it and obtain the asdatabase file you execute the following:

devenv MyAsProject.dwProj /rebuild Relase /project MyAsProject

So far so good – now you have automated the createion of the .asdatabase file.

You are not done yet. The asdatabase file provides the input to the Analysis Services Deployment Wizard. The spell cast by this wizard allows you to transform an asdatabase filen into an XML/A file. This XML/A file can be run on your deployment target – which will create the cube database.

From the command line the Deployment Wizard can be run in two modes.

The answer mode:

Microsoft.AnalysisServices.Deployment.exe /a

In this mode, you get to choose all the server specific settings and write them back to your bin directory. Check out the BOL documentation to find out exactly how this is implemented (You will find no defense for the naming of the executable in the documentation)

Once you have your answers stored (which you will only do once) you want to automate the creation of the XML/A file. Execute:

Microsoft.AnalysisServices.Deployment.exe MyAsProject.asdatabase /d /o:MyProject.xmla

Voila! You have your xmla file, which you can copy to your deployment target and run. By the way. xmla files can be run from the command line using ascmd.exe which you can find here: http://msdn2.microsoft.com/en-us/library/ms365187….

Automating the building of BI-Solutions

2006-11-28 2 comments

Coming from a "Old fashioned" development environment (that is: C/C++, CSV, Unix and the likes) I have always wondered why BI-projects don’t seem to get the idea of a structured software development.

There are some technological cornerstones of a good, well structured software development process:

  • Source control systems – Never loose code again, unlimited undo. Tag your releases
  • Automated, daily builds – Make sure you project is ready to deploy and release. Never again spend hours on manually building your project
  • Bug tracking and issue logs – Speaks for itself. A necessary tool for proper project management
  • Multiple environments - At the very least: Development, Test and Production.
  • Change management - Well documented and scripted procedures for moving your code into production

I don’t know about you – but I have yet to see a Business Intelligence projects use all of the above. Some, actually a lot, of the examples I have worked on did not even use ANY of the above. My theory is that BI-development community is in a sense "immature" to the ways of structured software development.

With Team Foundation Services and the new Visual Studio integration of the Microsoft BI-suite this is about to change. On my current development project I have decided to try a fully structured approach to BI-development with my team.

Team Foundation gives us the following for free (or rather – for a very cheap license):

  • Source control – Microsoft finally got it right. Source Safes successor is brilliant and has the multiple checkout features and proper merge functionality I missed from CVS (another, very good, source control system). Ohh, and did I mention that the source control database does not seem to spontaneously break anymore?
  • Bug tracking and issue logs - Working as intended – and working well. Integrates seamlessly with both Visual Studio and the source control. This means that my developer team can actually be bothered to use it.

We have set up multiple environments and I have borrowed a few change management procedures from my past.

Now, Team Foundations promises to automate our daily build too. It turns out that this is not really working with BI-projects. Team foundation uses a tool called msbuild – which it will execute automatically on solution files. However, msbuild doesn’t have any plug-ins for BI-projects :-(

Guess it was a bit too much to ask for anyway – back to basics: scripted builds. We are experimenting with daily build using command line script. Once we have the "building blocks" to build a BI-solution we plan to integrate it with the Team Foundation Build.

Follow

Get every new post delivered to your Inbox.

Join 271 other followers