Snapshot Serengeti Talk

Questions regarding Raw data vs. Consensus aggregated data

  • asaffoox by asaffoox

    Hi All,
    As part of my master degree thesis, I am working with datasets I found on the web:
    consensus_data.csv
    raw_data_for_dryad.csv

    I seem to fail to understand how was the consensus data aggregated.

    From the information on the "consensus_data.csv" file, it seems like it is aggregated under CaptureEventID (no reference to specific image name) but there seems to be, in some cases, duplication of the same CaptureEventID with the same species and/or different ones.
    While on the "raw_data_for_dryad.csv" file I can see, for each CaptureEventID (no reference to specific image name), a list of users and their classifications.

    My questions:

    1. Why there are duplication of CaptureEventID on the "consensus_data.csv" file with the same species and/or different ones?
    2. Does the duplication means there are several images for the same CaptureEventID and by thus I need to differentiate between the two classification?
    3. If item #2 above is correct (I should differentiate between the two classification) - why there are cases where it is the same species stated on the duplicated CaptureEventID lines? Shouldn't they be aggregated as well to a single line?
    4. Is there a way to find the users specific classification for each image per CaptureEventID?

    Thanks in advanced,
    Asaf

    Posted

  • maricksu by maricksu moderator in response to asaffoox's comment.

    Hi asaffoox,

    This seems to be the same question as in

    https://talk.snapshotserengeti.org/?_ga=2.197090671.2138218604.1496352367-1394240084.1478376452#/boards/BSG0000001/discussions/DSG0001ult

    Please check info there 😃

    Posted