Page 1 of 1
[18:33:49] micahbf: @hazelux i didn't initially see the parameterization and distance calculation, which adds some complexity, but the idea would be to self join on title, which will only return (exact) dups from the database
[18:43:23] micahbf: @hazelux good luck! one thing is if you use some other algorithm for string similarity that doesn't do comparison, but just comes up with some simplified version, then you only need one iteration
[18:45:22] micahbf: totally depends on use case, but for example see soundex https://en.wikipedia.org/wiki/Soundex
[18:45:49] micahbf: the idea is that you would loop through once, indexing by soundex, then if you already have something indexed for the same soundex key, you have a dup
[18:46:17] micahbf: what sort of dups are you seeing that you need the jaro-winkler distance checking?
[20:41:40] micahbf: in this case ActiveRecord::Associations::CollectionProxy::ActiveRecord_Associations_CollectionProxy_PaperTrail_Version
[20:47:19] micahbf: nor for that matter if I have access to proxy_association if I just open up the class