SR-BH 2020 multi-label dataset
Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)
View Archive InfoField | Value | |
Title |
SR-BH 2020 multi-label dataset
|
|
Identifier |
https://doi.org/10.7910/DVN/OGOIXX
|
|
Creator |
Sureda Riera, Tomás
Bermejo Higuera, Juan Ramón Bermejo Higuera, Javier Sicilia Montalvo, Juan Antonio Martínez Herráiz, José Javier |
|
Publisher |
Harvard Dataverse
|
|
Description |
The dataset is composed of web requests collected during 12 days of July 2020 by a web server (Wordpress) installed on a virtual machine and exposed to Internet. On this server, Modsecurity version 2.9.2 for Apache, with Core Rule Set (CRS) version 3.3.0 was installed in ”Detection only” mode, so that all requests (legitimate and malicious) were recorded in the log generated by ModSecurity, but without being blocked. Daily, the logs generated by ModSecurity were collected and the virtual machine was restored to a clean state. Once the web server exposure period was over, the collected logs were manually and semi-automatically processed to review the web request tagging performed by Modsecurity, correcting where necessary the normal/attack assignment to the corresponding web request and ensuring an appropriate CAPEC classification assignment. The final result is a multi-label dataset aimed especially at web attack detection and composed of 907,814 requests of which 525,195 are normal requests and 382,619 are anomalous requests, where each record has 24 different features and a set of 13 labels. |
|
Subject |
Computer and Information Science
|
|
Contributor |
Sureda Riera, Tomás
|
|