Status of analysis pipeline

Dear Citizen Scientists!

Long time no hear from me, sorry guys! Last year I was struggling to manage 4 projects in parallel, but at least one of them is finally funded PlanetFour activity (since last August), yeah!

I’m now down to three projects, with another one almost done, leaving me more time on PlanetFour. Things are progressing slowly, but steadily. To recap, here’s where we are:

We have identified 5 major software pipelines that are required for the full analysis of the PlanetFour data, starting from your markings to results that are on a level that they can be used in a publication or shown at a conference. Four of these pipelines are basically done and stable, with the fifth one existing as a manual prototype but not yet put into a stable chain of code that can run from beginning to end. Figure 1 shows the first four pipelines that are finished.

Pipeline_shrunk

Figure 1: The current manifestation of the PlanetFour analysis pipeline.

The need of the fifth pipeline was only discovered recently, when we tried to create the first science plots from PlanetFour data: Some of the HiRISE input data that we use is of such high resolution (almost factor 2 better than the next level down) that the Citizen scientists discover a lot more detail than in the other data. This led to an un-natural jump of marked objects over time, making us wonder for a bit why so late in the polar summer a sudden increase in activity would occur. Until I checked the binning mode of the HiRISE data that was used for those markings. All of the ‘funny data’ were taken in the highest resolution possible (while others for data-transport margins are binned down by a factor of 2 or 4).

So, we now understand that we need to filter and/or sort for the imaging mode that HiRISE was in when the data was taken, which is not a big deal, it just needs to be implemented in a stable fashion instead of trial-and-error code in a Jupyter notebook.

Okay, the other thing that is new: For months we were clustering your markings together using only the x,y base coordinates of fans and the center x,y coordinates of blotches. This simplest approach worked already quite well, but a closer review of the acceptance and rejection rates revealed that some of the more ‘artistically’-motivated markings would survive this reduction scheme and create final average objects that would have seemed to come from nowhere at a quick glance. Take Figure 2 for example:

artistic_marking_without_angle_clustering

Figure 2: Process chain for one PlanetFour image_id. Upper left: The HiRISE tile as presented to Citizen scientist. Upper middle and right: The raw fan and blotch markings as created by YOU! 😉 Lower right and middle: The reduced cluster average markings. Lower left: After fnotching and cutting on 50% certainty, the resulting end products.

One can see that the lower left image, the end of the first 3 pipelines, contains some markings that seem to come out of nowhere. They are in fact created by an artistic set of fans visible in the upper middle plot, where three fan markings are put where no visible ground features are, and because the base points of these 3 fans are nicely touching each other, they survive the clustering reduction, as the algorithm thinks it is a group of valid markings. Or, better said, it *thought* so. As I taught it better now, and it includes the direction as a criterion for the clustering as well. As Figure 3 shows, this helps cleaning up the magical fans out of nowhere.

artistic_marking_with_angle_clustering

One can see, there’s still some double-blotches visible, but another loop over those remaining ones, checking for close-ness to each other will unify those as well.

One last thing I want to mention is “fnotching”, as some of you might wonder what that actually means. In difficult-to-read terrain or lightning, or when the features on the ground are kinda hard to distinguish between fans and blotches, it happens that the same ground object is marked both as fan and blotch, and both often enough to survive the clustering. We call these chimera objects “fnotches”, glued together from FaNs and blOTCHES. 😉 What we do is looping over objects that survive the clustering, and if a fan and blotch are close to each other, we store how many Citizens have voted for both, create a statistical weight out of that (the ‘fnotch’-value) and store that, too, with the fnotch object. Then, at a later point, depending on the demands of certainty, we can ‘cut’ on that value, and for example say that we only consider something as a fan if 75% of all Citizens that marked this object have marked it as a fan. That way we can create final object catalogs depending on the science project that the catalog is being used for.

We have just submitted another conference abstract with the most recent updates to the 47th Lunar and Planetary Science conference, and I seriously, seriously want our paper to be submitted until then, so that you all can see what wonderful stuff we created from all your hard work!

Wish us luck and have a Happy 2016 everyone! Or, as the star of one of my favorite video blogs, HealthCare Triage, keeps saying: To the research!

Michael

About Michael Aye

Space Data Digger

Leave a comment