Please Sign In or Register to join the Community to see more content!

SQL Transformation Trick

Ever wanted to find a quick way to increase your match rate when unifying your data? I recently came across a cool way to use text analysis to do just that. Check it out!

I started with a query on my tag list (using SQL transformation), to count the frequency of all the different words in my tag description field:

SELECT `word`, count(*) AS count_words from 

(SELECT explode(split(`Description`, ' ')) as `word` from In) 

group by `word` 

order by count_words desc


And then I used another SQL transformation to find tags with keywords I was interested in

Select 'Valves' as `Asset Type`, * From In

WHERE `Description` LIKE '%VLV%' OR

`Description` LIKE '%Valv%'



This allowed me to then join my tag list with my other data sources using these common key words.

Comments