Questions regarding Raw data vs. Consensus aggregated data

by asaffoox

Hi All,
As part of my master degree thesis, I am working with datasets I found on the web:
consensus_data.csv
raw_data_for_dryad.csv

I seem to fail to understand how was the consensus data aggregated.

From the information on the "consensus_data.csv" file, it seems like it is aggregated under CaptureEventID (no reference to specific image name) but there seems to be, in some cases, duplication of the same CaptureEventID with the same species and/or different ones.
While on the "raw_data_for_dryad.csv" file I can see, for each CaptureEventID (no reference to specific image name), a list of users and their classifications.

My questions:
1. Why there are duplication of CaptureEventID on the "consensus_data.csv" file with the same species and/or different ones?
2. Does the duplication means there are several images for the same CaptureEventID and by thus I need to differentiate between the two classification?
3. If item #2 above is correct (I should differentiate between the two classification) - why there are cases where it is the same species stated on the duplicated CaptureEventID lines? Shouldn't they be aggregated as well to a single line?
4. Is there a way to find the users specific classification for each image per CaptureEventID?
Thanks in advanced,
Asaf

Posted October 12, 2017 10:46 AM
by maricksu moderator in response to asaffoox's comment.

Hi asaffoox,

This seems to be the same question as in

https://talk.snapshotserengeti.org/?_ga=2.197090671.2138218604.1496352367-1394240084.1478376452#/boards/BSG0000001/discussions/DSG0001ult

Please check info there 😃

Posted October 12, 2017 4:23 PM