Physical Schema Naming Conventions
Before designing your physical schemas it is highly recommended to take the time to develop naming conventions. Because table and view names are Immutable, and changing column names after the fact will cause issues with reports and business schemas, it is worth the time up front to design a set of conventions and a method of governance. Additionally, a set of standards will provide clarity and consistency to your schemas and minimize the risk of losing institutional knowledge from staff turnover. It can be helpful to think of three kinds of rules that will comprise your complete naming convention:
- Semantic - Based on the components of the object
- Syntax - How the components of a name should be arranged
- Lexical - The language-related aspects of names
Within the physical schema, the overall goal of the naming standards (besides consistency) should be to easily identify the source of each object. Business friendly naming will typically occur at the Business Schema level.
We recommend that you be familiar with your data sources and these Incorta concepts before exploring this topic further:
These concepts apply to all releases of Incorta for both on-premise and cloud.
As a starting point you should review the name validation rules for physical schema table names:
- Must be unique to the physical schema
- Must be between 1 and 128 characters in length
- Must begin with an alpha character, lower or upper case
- After the first alpha character, can contain zero or more alphanumeric characters in lower, upper, or mixed case
- After the first alpha character, can contain zero or more underscore
(_)or Dollar sign
- Besides underscore
(_)and Dollar sign
($), cannot contain special characters, symbols, or spaces
It is also recommended to decide on a set of representation terms if columns will need to be renamed from their source system(s), although these typically will be utilized more often in your business schemas. This example list of representation terms and definitions is based on the Class Word list in the Guide on Data Entity Naming Conventions from NIST.
Amount – Monetary quantity.
Average – Numeric value representing an arithmetic mean.
Count – Non-monetary numeric value arrived at by counting.
Code – A system of valid symbols that substitute for longer values.
Date – Calendar date.
Measure – A record of the dimensions, capacity/amount (non-monetary) of an object.
Name – A designation for an object.
Number – A number associated with an object, used as an identifier.
Quantity – Non-monetary numeric value not arrived at by counting.
Rate – A quantity or amount considered in relation to another quantity or amount.
Text – An unformatted descriptive field.
Time – Time of day or duration.
Lastly you should compile a list of all of the data sources you are going to be using, as well as any likely to be added in the future. Taking a holistic view now will help prevent contradictions and re-work down the road. With these foundational pieces in place, you can begin identifying your rules
Your first step should be identifying the semantic rules you wish to use. These are a set of guidelines for naming based on the components or attributes of the table or column.
- Schema names are typically based on their underlying source, or the functional area the data covers.
- When dealing with single source tables it is often useful to keep the table and column names identical to the source to assist in understanding the lineage of your data.
- Column names are similarly kept identical in most cases, or should be built based off your representation terms previously defined.
- Multi-source tables are typically a concatenation of the underlying tables, and may or may not include an identifier for the source system.
- When utilizing a data warehouse as a data source, it can be useful to identify the type of table (dimension or fact) if there is not already an identifier. For Dimension tables the most common identifiers are Dim, Dim_ or d_ while Fact tables are typically denoted with fact or f. Schema manager allows you to sort by name ascending/descending, and also allows you to search by name so it is easy to essentially filter your view of the tables down to dimension or fact.
- Related, some customer find it useful to add an identifier to the end to identify the type of table the object is:
Now that you have defined semantic rules, your syntax rules will specify the arrangement of these components. The most common usage is the class (dim, fact, etc.) is given the leftmost position in the name, with the type (table, alias, view, etc.) given the rightmost position. Between these two terms the basic table name will reside. For column names that are not retaining their original names, any qualifying terms should be placed before the representation term.
Lexical Rules are another aspect that often come into play more heavily in Business Schemas, but it is still critical to identify and document them now. These rules will determine the look and feel of your object names. Some factors to consider and potential examples for your lexical rules include:
- Nouns - Should always be in singular form
- Verbs - If they are allowed, they are in the present tense.
- Spacing - Are underscores utilized or not?
- Case - lower, UPPER, camelCase, PascalCase, etc. All are acceptable but only one should be used.
- Assuming abbreviations, acronyms, and initialisms will be used, create and publish a list of accepted values.
If you are feeling overwhelmed by all of this, there are additional resources you can look to, including ISO 11179-5, Information technology – Specification and standardization of data elements, Part 5: Naming and identification principles for data elements, or the Guide on Data Entity Naming Conventions from NIST.
Outside of adhering to validation rules, there is no "right" or "wrong" when designing naming standards as long as they are comprehensive, consistent, clear and communicated. And with a well-designed set of naming conventions, you will be ready to move on to designing your schemas.