Run Twice Safe SQL for Migration

I've had the pleasure of working in multi-tiered environments for much of my time as a developer. Code is written in development, moved to a test environment, or two or three, and then migrated to production with a release. Each environment has its own application and database servers.

To facilitate migration between these environments, all SQL is put into a task-specific script file, and committed to source control. Since it is possible for a script to be executed more than once per environment, given the need for bug fixes or feature creep, the SQL needs to be written to be run-twice safe.

Run-twice safe means I can safely run the same script in the same environment multiple times without entering duplicate data or reaching an exception due to already extant schema changes.

Since I work primarily with MS SQL Server, my example below is written in T-SQL. However, the concept should work with almost any RDBMS.

The keys to run-twice safe are:

  • Try/Catch
  • Transactions
  • Conditionals
  • Error checking

Try/Catch

The entire script is wrapped in a try/catch block to allow us to catch and gracefully handle any exceptions which may arise.

view plain print about
1BEGIN TRY
2 // do stuff
3END TRY
4BEGIN CATCH
5 // handle the error
6END CATCH

Transactions

The functional portion of the script is wrapped in a transaction block. If one part of the script fails, we want to roll the entire thing back. Since the failure is an exception, the catch block will handle the rollback.

view plain print about
1BEGIN TRY
2 BEGIN TRANSACTION
3 // do stuff
4 COMMIT TRANSACTION
5END TRY
6BEGIN CATCH
7 IF @@TRANCOUNT >
0
8 ROLLBACK TRANSACTION
9 // handle the error
10END CATCH

Checking to see if a transaction is open before attempting a rollback is a safety valve. If you are worried about multiple transactions being open, the rollback could be rewritten to be

view plain print about
1WHILE @@TRANCOUNT > 0
2 ROLLBACK TRANSACTION

Conditionals

Before adding, subtracting, or updating anything, whether data or schema-related, perform a check to see if that bit of work is needed. For example, before adding a new table, check to see if it already exists.

view plain print about
1IF NOT EXISTS (
2 SELECT table_name FROM information_schema.tables WHERE table_name = 'sometable'
3BEGIN
4 // create the table
5END

For schema changes, I heavily depend on the INFORMATION_SCHEMA views. It just makes my life that much easier. For data changes, I look to see if the value I want to change has already changed, or even exists.

view plain print about
1IF NOT EXISTS (
2 SELECT somecolumn FROM sometable WHERE somecolumn = 'new value'
3)
4BEGIN
5 // insert/update the value
6END

Error Checking

Finally, if an error does happen, then we need to remember to throw it back to the console, as the try/catch block will swallow it. Using a RAISERROR statement, for T-SQL, in the catch block ensures we know what went wrong.

The Persistant SQL UDF

I came across a very strange and unexpected problem with a SQL Server 2005 UDF yesterday that continues to puzzle me. First, the cast of players:

  • ColdFusion MX 7
  • SQL Server 2005
  • A table-valued function
  • Puzzled developers

In case you are not familiar with table-valued functions, they are functions in SQL that return a table, as opposed to the more traditional scalar value.

We had to change the data type for one of the columns in the returned table from decimal to text to fix a bug. Nothing a simple ALTER FUNCTION statement could not handle.

Script, run, test, same error, what?

Maybe something is cached on the SQL Server side. Let's try dropping and recreating the function.

Script, run, test, same error, ???

Something very odd was happening. Only after restarting the ColdFusion Application Server service did the error go away.

I am still searching for an answer. Surely ColdFusion does not cache database object information...?

Version Control for Database Objects?

I have used a variety of version control methods and products over the years. In the beginning, I (infrequently) added a digit or date to the end of the file name, an effort that was largely ineffective and highly subject to user error. :-) Then, corresponding with a job change, I was introduced to version control software in the shape of CS-RCS from ComponentSoftware, based on GNU RCS. After a couple of years, the development team made the shift to Subversion, my personal favorite. To be complete, I will throw in a year of pain using Visual SourceSafe.

[More]

Finding All Triggers in SQL Server

Database triggers are sometimes difficult to track down. I needed to locate a list of all of the triggers in a database to check them for performance issues, and, when I turned to the INFORMATION_SCHEMA views, discovered there is not one for triggers. Oh well, back to interrogating the system tables.

Triggers are located in the sys.triggers table, so the following query delivered the information I needed:

view plain print about
1SELECT
2    o.name AS parent_object,
3    t.name AS trigger_name,
4    c.text AS trigger_def
5FROM sys.sysobjects o
6    INNER JOIN sys.triggers t
7        ON t.parent_id = o.id
8    INNER JOIN sys.syscomments c
9        ON c.id = t.object_id

Finding a Random Row

Every now and then I get a requirement to do something to a random table record. The most common is to retrieve a random record for display. Other times it is part of a data scrambling routine. So how is this done?

Most database platforms have a function that generates a unique identifier. SQL Server has NEWID(), which generates a GUID:

view plain print about
1SELECT NEWID()
2
3Example Output: 0B7869E5-1428-4CA9-9D5A-CDA0E9756DC2

MySQL has RAND(), which generates a random floating-point value:

view plain print about
1SELECT RAND();
2
3Example Output: 0.93845168309142

Using these functions, retrieving a random row from a table becomes as simple as:

SQL Server

view plain print about
1SELECT TOP 1 *
2FROM mytable
3ORDER BY NEWID()

MySQL

view plain print about
1SELECT *
2FROM mytable
3ORDER BY RAND()
4LIMIT 1;

So what is really going on behind the scenes? The key to the randomness is the ORDER BY clause. As each row in the table is evaluated, it is assigned a unique value via the NEWID() or RAND() function. The value changes for each row and each time the query is executed.

Coding Standards

I admit, I am a code diva. I want to see and write clean, clear, prettified code. I firmly believe that all programmers (and teams) should have, and more importantly follow, a set of coding standards. A few of my personal CF quirks are:

  • Tabs not spaces for code indentation.
  • Functions are camel-cased, e.g., querySetCell and arrayToList.
  • Variables are camel-cased and descriptive, e.g., currentRecord and dateOfBirth.
  • Component names are capitalized, e.g., User and UserService.
  • SQL keywords are upper case, e.g., SELECT username FROM users.
  • Use implicit boolean function results, e.g., .
  • Use structKeyExists over isDefined whenever possible.
  • And so on.

The important concept behind coding standards is not necessarily the nitty-gritty details of what is good and what is bad, but the discipline it provides your coding efforts. Consistency in coding is more important than the (endless) debates about which code formatting template is more readable or which function is better. If you cannot read your own code because you never structure it the same way twice then you have only yourself to blame.

Building a Date Table with ColdFusion

If you have ever worked with a data warehouse, or a complex, date-sensitive application, then you have probably come across a date dimension table. Simply put, a date dimension table is a lookup table that contains one record per date. Each record has information about that date, such as the day of the week, the quarter it falls in, and so on. Many DBAs come armed with a SQL script that creates and populates a date dimension table. If you are not a SQL adept, why not use the power of ColdFusion?

Each record in a date dimension table describes a date, so our table will have the following columns:

  • date - the full date
  • year - the date's year
  • month - the date's month number
  • month_desc - the month's name
  • day - the date's day
  • quarter - the date's quarter within the year
  • day_of_week - the date's day of week number
  • day_of_week_desc - the day of week's name
  • day_of_year - the date's day number within the year
  • week_of_year - the date's week number within the year

In SQL Server:

view plain print about
1CREATE TABLE days (
2    [days_id] int NOT NULL IDENTITY(1,1) PRIMARY KEY,
3    [date] smalldatetime NOT NULL,
4    [year] tinyint NOT NULL,
5    [month] tinyint NOT NULL,
6    [month_desc] varchar(10) NOT NULL,
7    [day] tinyint NOT NULL,
8    [quarter] tinyint NOT NULL,
9    [day_of_week] tinyint NOT NULL,
10    [day_of_week_desc] varchar(10) NOT NULL,
11    [day_of_year] tinyint NOT NULL,
12    [week_of_year] tinyint NOT NULL
13)

Now that the table is created, it needs data. Populating the data is simply a big loop using the many different CF functions for splitting and formatting dates. In this case, I will generate a script that can then be run in a database query tool, such as Query Analyzer:

view plain print about
1<cfset currentDate = createDate(2000, 01, 01)>
2<cfset targetDate = createDate(2008, 12, 31)>
3
4<cfloop condition="dateDiff('d', targetDate, currentDate) LTE 0">
5    <cfoutput>
6        INSERT INTO days (date, year, month, month_desc, day, quarter, day_of_week, day_of_week_desc, day_of_year, week_of_year)
7        VALUES (
8            '#createODBCDate(currentDate)#',
9            #year(currentDate)#,
10            #month(currentDate)#,
11            '#monthAsString(month(currentDate))#',
12            #day(currentDate)#,
13            #quarter(currentDate)#,
14            #dayOfWeek(currentDate)#,
15            '#dayOfWeekAsString(dayOfWeek(currentDate))#',
16            #dayOfYear(currentDate)#,
17            #week(currentDate)#
18        )
19    </cfoutput>
20    <cfset currentDate = dateAdd("d", 1, currentDate)>
21</cfloop>

In the above example, the date dimension table will contain one record for each day between January 1, 2000 and December 31, 2008. In the real world you will probably use a longer timespan, depending on your needs.