Record Details

Modified Versions of Diving48: Shape and Texture

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title Modified Versions of Diving48: Shape and Texture
 
Identifier https://doi.org/10.7910/DVN/MXJPIZ
 
Creator Broomé, Sofia
 
Publisher Harvard Dataverse
 
Description We modify the Diving48 dataset ("RESOUND: Towards Action Recognition without Representation Bias", Li et al., ECCV 2020) into three new domains: two based on shape and one based on texture (following Geirhos et al., ICLR 2019).

Note that the Statistical Visual Computing Lab in San Diego (http://www.svcl.ucsd.edu) has the copyright to the Diving48 dataset.
Please cite the RESOUND paper, if you are using any data related to the Diving48 dataset, including our modified versions here "RESOUND: Towards Action Recognition without Representation Bias", Li et al., ECCV 2020.

In the shape domains, we blur the background and only maintain the segmented diver(s) (S1), or their bounding boxes (S2). In the texture domain (T), we conversely mask out bounding boxes where the diver(s) are, and only keep the background. The masked boxes are filled with the average Imagenet pixel value (following Choi et al., NeurIPS 2019). The class evidence should lie only in the divers' movement; hence, the texture version should not contain any relevant signal, and the accuracy should drop to random performance. Thus, we can study how different models drop in score when tested on the shape or texture domain, indicating both cross-domain robustness (for S1 and S2) and texture bias (for T).

This modified dataset was introduced in "Recur, Attend or Convolve? Frame Dependency Modeling Matters for Cross-Domain Robustness in Action Recognition", Broomé et al., arXiv 2112.12175. Only the test set of Diving48 was used there -- we did not train on these modified domains, they were only for evaluation.

The files are .mp4-videos, consisting of 32 frames each, regardless of the length of the original clip (but they are typically around 5 seconds long).

We may consider to upload also the training set, please contact us if you need it urgently. Otherwise, the trained model for diver segmentation is released in this repository https://github.com/sofiabroome/diver-segmentation if you want to perform the cropping and saving yourself, at your own desired frame rate.
 
Subject Computer and Information Science
diving48; texture bias; shape bias; cross-domain robustness; domain shift; action recognition; fine-grained action recognition
 
Contributor Broomé, Sofia