Gold Standard Data
We wanted to give you a quick update on what the science team is up to. We’ve been hard at work making progress towards the goal of completing the project’s first paper, thanks to your help marking the fans and blotches in the season 2 and season 3 cutouts. Fingers crossed, we aim to submit the paper to a scientific journal by the early Fall.
We’ve been working on developing the best method to combine your classifications to identify blotches and fans in the cutouts you see on Planet Four. About 30-100 people look at each cutout, and we need to combine those markings together to get the exact locations and shapes of the seasonal features. This is one of the aspects the team is thinking about, but one of the other things we’ve started working on is looking at how well the project can identify fans and blotches in the images. You might think that this is obvious just looking at the images that the dark fans pop out on the screen, but for the paper we need to prove this is true. This is important because if we want to look at how carbon dioxide geyser/jet activity changes over time, we need to be able to show that the project as a whole can really identify most of the fans and the blotches in the cutouts we show on the site.
Some of the other Zooniverse projects have compared to simulated or synthetic data that was shown and classified on the site (so they knew what the right answer was for the synthetic data) or there was already a catalog of a portion of the data that could be used to compare to. It’s difficult to make a simulated HiRISE image with fans. As you’ve seen from the images, there is so much variation in color and texture of Mars and same with the the shapes, sizes, and even color of fans that this would be likely be impossible to get just right. So what about a previously made catalog of a small subset of the data we’re showing on Planet Four? There are so many blotches and fans that no one has attempted this on a large enough scale for us to fully compare the results from Planet Four to. Planet Four really is making the first map of these dark seasonal features in the HiRISE monitoring images of the Martian South Pole. The science team didn’t even know at launch exactly how frequently in the images you’re looking at on the site that you’ll see fans and blotches. You clicks are telling us the answer.
Even if there was such a catalog of fans and blotches to compare to, it would be done in a different way (not done using the web interface we have for Planet Four) that likely has its own biases towards detection and non-detection of the seasonal features. So what do we do? Well, the science team can make our own ‘gold standard‘ dataset by classifying a small subset of the cutouts we showed for Season 2 and Season 3 (a few % of the entire Season 2 and Season 3 cutouts) in the interface and use those markings in the same way as a catalog. Candy, Anya, and Michael have stared at many many images of fans and blotches from HiRISE, and I had one of my graduate preliminary exam project on mapping the fans and blotches in images from a previous lower resolution camera (Mars Orbiter Camera – MOC). So we can argue that we should be able to identify fans and blotches reasonably well in the Planer Four images and use our markings to create a catalog which we are calling the gold standard dataset. Other Zooniverse projects, such as Snapshot Serengeti, have done something similar.
Right now, we’re going through a mostly random selection of cutouts from Season 2 and Season 3, marking the fans and blotches we see with the same classification interface on the Planet Four website. To get a large enough sample of cutouts reviewed, we’re each mainly marking different cutouts than each other, but we do have a small amount of overlap for us to compare our results to each other and understand the differences. For example, I’ve looked at less images of HiRISE fans than the rest of the science team, that might make me more liberal with my markings than say Candy or Anya. The overlap should help us understand and calibrate for those kinds of effects between the different science team markings.
It might seem like we’re testing you but we’re really testing the project. This gold standard dataset is going to help us explore and show how well Planet Four as a whole can identify the seasonal fans and blotches in the images from HiRISE. Without the analysis and comparison to the gold standard data, we would just have a catalog of fans and blotches on the surface. The comparison to the gold standard data is a really vital part of the project, and allows us to study how the Martian climate impacts the formation of the seasonal blotches and fans from Martian year to year and throughout a given season.
We’ll keep you posted about the gold standard data and our analysis as it continues to develop. Stay tuned to the blog for more updates about the paper as we get further into the Summer.
4 responses to “Gold Standard Data”
Trackbacks / Pingbacks
- February 3, 2015 -
- March 18, 2015 -
- December 17, 2015 -
Hi Meg — Thanks for the update/explanation. That is a challenge you all are engaged in for sure. One point I did not pick up on in the blog was the need to detect and file accordingly for is SUBLIMATION, which by color could be mistaken for a fine dusting at the end of a subsurface OUT GASSING/VENTING field. Use of the entire image would help to ascertain the verification. For example some clips show a “wet” looking surface area with all features visible and appearing to be “without dust” from out gassing/venting. Keep up the great work and best wishes for a great report in the Fall! : )