Record Details

SR-BH 2020 multi-label dataset

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title SR-BH 2020 multi-label dataset
 
Identifier https://doi.org/10.7910/DVN/OGOIXX
 
Creator Sureda Riera, Tomás
Bermejo Higuera, Juan Ramón
Bermejo Higuera, Javier
Sicilia Montalvo, Juan Antonio
Martínez Herráiz, José Javier
 
Publisher Harvard Dataverse
 
Description The dataset is composed of web requests collected during 12 days of July 2020 by a web server (Wordpress) installed on a virtual machine and exposed to Internet. On this server, Modsecurity version 2.9.2 for Apache, with Core Rule Set (CRS) version 3.3.0 was installed in ”Detection only” mode, so that all requests (legitimate and malicious) were recorded in the log generated by ModSecurity, but without being blocked. Daily, the logs generated by ModSecurity were collected and the virtual machine was restored to a clean state.

Once the web server exposure period was over, the collected logs were manually and semi-automatically processed to review the web request tagging performed by Modsecurity, correcting where necessary the normal/attack assignment to the corresponding web request and ensuring an appropriate CAPEC classification assignment.

The final result is a multi-label dataset aimed especially at web attack detection and composed of 907,814 requests of which 525,195 are normal requests and 382,619 are anomalous requests, where each record has 24 different features and a set of 13 labels.
 
Subject Computer and Information Science
 
Contributor Sureda Riera, Tomás