Model Maintenance Best Practices in Element Unify
In the practice of data engineering, context is extracted from different data sources to ultimately create a comprehensive, context-rich data model that can be used in consuming applications. Customers bring a variety of data sources to Element Unify including tag definitions from AVEVA PI Data Archive or AspenTech IP 21, maintenance records from SAP PM or Maximo, engineering documents such as P&IDs and design specs, and operator logs, to name a few. The challenge with these data systems is that not all naming conventions are the same, mainly because of creating new tags over time that are not compliant with the standards or because of the acquisitions of new units and businesses. While Element Unify facilitates the cleansing and integration of these data sources significantly, our customers often find themselves in a case where 10-20% of their data do not follow any business logic; thus, they cannot be easily interpreted, cleansed, or mapped to other data sources via repeatable data transformations. The remaining 80-90% of the developed context-rich data model also needs validation before using it in the production environment. In both cases, the pipeline developers tend to seek consultation from the subject matter experts (SMEs) on the business unit or the data owners. Their process usually involves downloading an existing version of the data model that they have created in Element Unify in an Excel format, sending it off to SMEs via email, and asking them to input their feedback inside the Excel file, sending it back via email, and finally uploading it into Unify so they can apply their feedback. This process by itself is prone to errors:
- There could be different versions of the same file that could be distributed via email. For instance, multiple SMEs may need to provide input on the same file, thus easily producing multiple versions of the same file
- The changes made in Excel are hard to track
- The thought process in making changes on the data model is not transparent, thus, making it difficult to validate at a later time
While most companies at the industrial scale try to break their data silos, this process yet creates data silos: Element Unify and Excel! This is the main reason Element team has added a new transformation for collaboration: Flow Audit
Flow Audit transformation provides an environment where developers and SMEs can collaborate to close the gaps on their data model and validate the data model before using it in production. With Flow Audit, users can freely change the content of the data model and fill in the blanks while all edits made on the model are being tracked. For instance, in the image below, the Tag Name, Asset Attribute, Asset Template, Region, and Functional Location are some key information that usually need to be validated/filled in. In such cases, the SME can easily overwrite any of the text objects provided in the Flow Audit, except for what is decided by the user as the ID column. The ID column is a column with unique identifiers that is used to track the changes made in the data. As seen in the second image below, the region associated with the tag 01II8419.PV was changed from NAM to APAC. If you, as the pipeline developer, decide that the edit made is incorrect, you can easily revert it to the original value.
While the Flow Audit transformation may only present itself as a collaboration terminal, it can also help with the model maintenance over time. Imagine the case where the data model has been developed and put into production. Now, after 6 months of putting the data model into production, new tags are created on the time series historian. If the new tags are created following the naming convention standards adopted at the corporate, the pipeline will pick up the new tags, automatically apply all the operations defined in the pipeline on the tags, and send them through to the destination system. In this view, no manual human intervention is needed. However, the chances are some tags may not have followed the standard naming conventions; thus, they fell through at some point in the pipeline. For such one off cases, it is best practice to use the Validator Transformation to be able to first catch them in an easy to track visual way. Additionally, it may not make sense for the business to dedicate additional pipeline developer time and resources to re-design the pipeline logics to account for a few one off cases. Instead, in many cases a business unit (as the ultimate data owner) wants to be able to correct the mistakes themselves without having to pull over pipeline developer resources so that all data can flow through again to the consuming applications. The Flow Audit transformation enables such handover of the data model maintenance from the pipeline developers to business owners, thus enabling both teams to remain productive at what matters the most to them.
Following are some best practices around using the Flow Audit Transformation:
1- Use a Column ID that is unique; if you are planning to use tag names as the identifier, you need to make sure they are unique. While in an industrial space the tag names are unique, it is possible that you may have used them in a Join transformation, which tends to cause duplication when there are multiple matches for the same tags. If this is the case, you either have to remove the duplicated tags before the Flow Audit transformation or use another column as the unique identifier. A good alternative is to create a row number column on the input data. You can create a row number column via the following query in the SQL Query transformation:
SELECT *, Monotonically_increasing_id() AS row_number FROM in
2- When presenting the data for SMEs to validate, it is best practice to limit the number of columns in the Flow Audit transformation to the minimum required to avoid any unintended overwriting of texts for columns that feedback is not needed (unless those additional columns provide the context needed to receive the feedback).
3- If you need to provide an area for the SMEs to add commentary, you can create a blank column for them to add any feedback they may have in the provided space. You can create a column via the following query in the SQL Query transformation:
SELECT *, '' AS Comment_Column FROM in
4- While Flow Audit can be used anywhere in the pipeline, it is best practice to not place it directly after the special transformations Map Template and Map Attributes, unless the additional columns created after these transformations (i.e. TEMPLATE*, SENSOR*, EQUIPMENT_ID*) are renamed. Generally, it is best to do as much as possible with the pipeline and use Flow Audit at the very last stages of the pipeline where feedback is needed on the remaining work that does not follow any business logic.
5- It is best practice to only share the link to the Flow Audit page and ask SMEs to only work within that page to avoid any unintended modifications to the pipelines. Please note that Element will soon enable a Workspace environment where users have access privileges per their role and the latter recommendation for SMEs will no longer be needed.