New version of Unify - 12 Sep 2023

12 Sep 2023 Release Notes
This release makes it easier to view and use nulls and/or empty strings, ensuring data accuracy.
Build Models
Manage Nulls vs. Empty Strings
In data definition and software programming, there is a distinct difference between null and an empty string. Null means an absence of value (i.e. there is no value), while an empty string is a String object with an assigned value, but its length is equal to zero. To check for nulls, use the Is Null operator. To check for empty strings, use = ‘’
Unify has always treated nulls and empty strings differently in the context of a pipeline - i.e., data transforms react differently For example, joining two empty strings (‘’ + ‘’) equals an empty string (‘’), joining an empty string with a string (‘’ + ‘test’) equals “test”, but joining a null with anything equals a null. However, in some cases, Unify did not preserve the distinction between the two - for example, when downloading a derived dataset as a CSV file.
To make it easier to view and use nulls and/or empty strings, we’ve made three changes to Unify:
- You can now see the difference between nulls and empty strings in the UI. This reduces confusion (not knowing if data is a null or an empty string) but doesn’t change the output of any pipelines.
- Unify now supports Avro files for derived datasets to preserve the distinction between nulls and empty strings. You can also upload/download a derived dataset as an Avro file. CSV is still supported.
- Throughout the entire flow, Unify maintains the distinction between nulls and empty strings
Impacts of these updates to existing pipelines
Any process or system that relies on all empty strings becoming nulls (or the opposite) may see a difference as the distinction will be preserved. For example,
- Where you have a series of pipelines and the flow relies on losing the distinction. E.g. pipeline 1 feeds into a derived dataset, which flows into pipeline 2
- For any pipeline, where the destination system that takes the derived dataset assumes no empty strings will be included
- For any use of a downloaded CSV of the flow that doesn’t accept nulls
- Note that for non-nullable columns, the value is assumed to be an empty string as nulls aren’t permitted
In general, we recommend you check any instance that checks for nulls and/or empty strings and confirm the behavior is as expected.
Other Updates
- Don't close flyout when saving a column configuration
- Group errors messages with the same message
- Correct URL when deploying a Azure Ingest Json connector
- Display full API error for graph queries
- Make identity column required in the graph builder form
- Pipeline performance improvements: all customers should see a significant increase in pipeline loading performance (when loading a pipeline from the pipeline catalog page or refreshing the pipeline canvas page). This performance change also results in a significant improvement to pipeline execution time, flow preview time, and validator computation time.