SQL Transformation Trick

Ever wanted to find a quick way to increase your match rate when unifying your data? I recently came across a cool way to use text analysis to do just that. Check it out!
I started with a query on my tag list (using SQL transformation), to count the frequency of all the different words in my tag description field:
SELECT `word`, count(*) AS count_words from
(SELECT explode(split(`Description`, ' ')) as `word` from In)
group by `word`
order by count_words desc
And then I used another SQL transformation to find tags with keywords I was interested in
Select 'Valves' as `Asset Type`, * From In
WHERE `Description` LIKE '%VLV%' OR
`Description` LIKE '%Valv%'
This allowed me to then join my tag list with my other data sources using these common key words.
Comments
-
This is very cool @kanyinsa - thanks for sharing!