Available traces with associated ground truth

The following traces are available upon request, albeit in anonymized, payload-stripped form, with the associated ground truth data collected by the Ground Truth (GT) system.
The traces come with a log file which indicates for each flow:

Based on this information, we apply a heuristics to infer the application protocol that generated each network flow (please refer to the GT paper for more details).

Anonymization scheme

Traces were anonimyzed by means of Crypto-PAn as follows:

How to get access to the traces

For information about how to get access to the traces, please contact netsignal@ing.unibs.it.

Please make sure to specify in the email your affiliation, your role (academic researcher/student) and a brief description of your research goals.
Upon approval you will receive an email with instructions on how to download the data you requested.

Datasets

UNIBS-2009

The traces were collected on the edge router of the campus network of the University of Brescia on three consecutive working days (09/30, 10/01 and 10/02). They are composed of traffic generated by a set of twenty workstations running the GT client daemon. We collected the traffic by running Tcpdump on the Faculty's router, which is a dual Xeon Linux box that connects our network to the Internet through a dedicated 100Mb/s uplink. Traces were saved to capture files on a dedicated hard disk that is connected to the router internals through a dedicated ATA controller.

We ended up with around 27GB of data, mainly composed of TCP (99%) and UDP traffic, which corresponds to around 79000 flows in total. The traffic includes Web (HTTP and HTTPS), Mail (POP3, IMAP4, SMTP and their SSL variants), Skype, traffic generated by Peer-to-Peer applications, such as BitTorrent and Edonkey, and other protocols (FTP, SSH, and MSN). Details are reported in Table 1. The anonymized and payload-stripped traces occupy around 2.7GB disk space.

Class of protocols Flows Bytes
Web 61.2% 12.5%
Mail 5.7% 0.2%
P2P (Bittorrent) 9.3% 15.9%
P2P (Edonkey) 18.4% 70.2%
Skype (TCP) 1.4% 1.0%
Skype (UDP) 3.8% 0.0%
Other 0.2% 0.2%
Total 78998 27 GB
Table 1. Composition of the UNIBS 2009-trace.

We computed the packet loss that affected the capture process by processing the Tcpdump log and by checking all the correctly terminated TCP sessions to discover missing segments. The last test was performed to determine if any other losses, besides the ones logged by Tcpdump, affected the capture process, such as overloads of the FIFO queues used by the DMA data transfer between the network adapter and the kernel. The analysis showed that the worst packet loss reported by Tcpdump was below 1%, and the missing segments from complete TCP traces were (in bytes) below 0.22%, showing negligible losses before the kernel processing.

UNIBS-2009 SSH tunnel

The traces were collected by routing the entire traffic of a given network of the campus of the University of Brescia through an SSH tunnel on three working days (06/16, 06/17 and 06/24). To this purpose, we developed SSHgate, a tool freely available here.

SSHgate allows us to correlate the encrypted sessions with the outer flows: by ascertaining the nature of the outer flow, we povide ground truth to the corresponding encrypted session. For more details, please refer to our SSHgate paper.

We release the traces in an anonimized form and with associated ground truth as provided by GT.