What is metadata, and why does it matter? Research shows that knowledge workers spend at least 30% of their work time searching for the right information, let alone analyzing or manipulating it.
I have seen this scenario repeatedly play out: “A developer who used to work here developed a database which is still in production use daily, but nobody 100% understands some of the more vague columns/fields in a lot of those tables. It never got documented and now that we are a bigger company, we’re trying to make sense of both the data structure and the code that depends on it.”
It’s so common, it’s cliché. The structure of the tables was put together by a developer with good intentions, but the column names might be very vague within the context in which they are used, like ‘Price’ or ‘Name’. That developer knew what she meant, but noone else does! Perhaps the company’s evolving business model is outgrowing and out-pacing the original data-model, or the column names were just poorly chosen. Should ‘Price’ have been named RetailPrice or WholesalePrice? ‘Name’ – LastName, FirstName, MiddleName, FullName, TechnicianName, or CustomerName?
Sometimes the name of the table containing the column gives enough context, more often it doesn’t.
With ‘Name’ you might just figure it out by looking at the actual data stored or the stored procedures that manipulate it. With ‘Price’, it might take a lot longer as you look at the calculations in various stored procedures, and ask business users their understanding of the intent of how the data was stored. Sometimes you never will get a straight answer. For ‘Percentage’, was that a discount percentage (retail, wholesale?), an upcharge percentage, commission percentage? You’re spec’ing or writing a tax report — is that ‘decTax’ a decimal-based column for use-tax or sales tax?
It’s the business’ accountability to give semantics or meaning to data, but it’s the responsibility of information workers to apply that meaning through metadata. Somewhere in the middle is usually a business analyst working to ensure the requirements for software creation or change are written correctly, and that meanings are captured and translated correctly for maximum understanding by IT. Sometimes there is no analyst and it all falls on a developer’s shoulders. The importance of this semantic layer in application development cannot be underestimated.
Fortunately, the solutions are not hard to implement
Fortunately, these time-consuming problems of reverse-engineering old code are very avoidable at the start with a little discipline, and if not avoided in the first-place, can be mitigated over time with tools that already exist and are sometimes free, provided enough of the right people have correct knowledge of the data use-context. Data-architecture reviews during development can solve these conundrums, but more often than not, a developer working closest to the data at the time it’s developed can take just a little more time to get this setup in a better manner and “future-proof” the code to the greatest extent possible at the time of creation.
From a SQL Server and .NET perspective, I have seen scores of database tables/columns, C#, and VB.NET classes with no metadata to describe the creator’s intent and use-cases. This information about the data stored is crucial to using these data structures correctly – in-turn, allowing software architects, business analysts, database administrators and developers to ensure a correct and mutual understanding of the data they are storing and manipulating.
For C# code, XML doc (documentation) is applied to classes, class-properties, and class-methods via the /// auto-completion trigger. For VB.NET, use the ”’ auto-completion trigger. In SQL Server, similar metadata is typically stored in the MS_Description field, but more metadata fields can be created and customized beyond that, per column, table, or other database object.
Data-dictionaries & Intellisense
Consistently applied, this metadata can be used to drive third-party tools that will pick it up and turn it into data-dictionaries or automatic code documentation that can be periodically refreshed via a scheduled job, for use on a Confluence server, Sharepoint, or other internal website. Think about it – a single-source of truth for what each table and column represents in your SQL Server, after the many developers and DBAs that have modified the database over the years. A single source of truth for the meaning and proper use-cases of every class, property, and method of every namespace in your company’s .NET code. No more having to do extensive code research to figure out by use-context what that field in your table (added ten years ago) really stands for, then not writing it down, and then someone else having to go through that process all over again five years from now when a vendor implementation is underway.
Applied at the database level, plug-ins into SQL Server Management Studio like SQL Prompt will override the very weak Intellisense provided natively and pick up this metadata and display it as you create and modify stored procedures, views, functions, etc. When you are dealing with anything more than a trivial database, this is a huge boon to productivity and coding accuracy. It’s the difference between knowing whether the column named ‘Price’ in the ‘Product’ table refers to a MSRP price, discount retail price, or wholesale price, and whether the internal applications use this column and/or the public-facing website. Knowing that information about the data is absolutely crucial to a software project’s success and a reduction in labor cost.
I think this graphic from eRSA in Australia sums up the need for metadata well, though it comes at it from the perspective of a large research university:
(see below the large graphic for links to my posts where I describe where to view and how to update the metadata in SQL Server)