Planet Four Clustering investigations
We want to share a quick update on the depth and details we are investigating to close out the last issues for our analysis pipeline for identifying the fans and blotches from the classifications from the original Planet Four.
We recently realized that it might be a good idea to allow different limits for clustering depending on the general marking size. Intuitively this seems to make sense as one automatically takes a bit more care the smaller an object is to mark.
However, more clustering versatility also means more parameters to set which need to be tested for their efficacy.
Below you can see two plots, one for “fan” markings, the other for “blotch” markings, that show different parameter settings for a clustering run and their effects on the final result.
This planet four image tile with the ID ’17a’ is one of the more problematic ones due to its very large but diffusively defined blotch and the markings are, understandably, all over the place.
Each plot title has the values EPS and EPS_LARGE called out. These are the above mentioned distance limits for clustering to happen. Here I leave the EPS value, the one for smaller markings constant over several tests, while I step the one for larger markings, EPS_LARGE, between 50 and 90 in steps of 20.
As one can see the large blotch is “surviving” in all cases (which it wasn’t before we introduced the split-by-size clustering approach), while in the fan case it only survives when the “MS” parameter, the number of minimum markings that a surviving cluster needs to have, is at 5. When requesting 7, it’s just not enough markings to have it survive. But that’s okay, because I’m pretty sure that we will have this survive as a blotch rather than a fan, due to the higher number of markings that voted for that.