UNIBS: Data sharing
Warning: your browser was not able to load the Style Sheet of this website. Therefore, your browsing experience will not be as convenient as it was intended by the author.
Available traces with associated ground truth
The following traces are available upon request, albeit in anonymized,
payload-stripped form, with the associated ground truth data collected by the Ground Truth (GT)
The traces come with a log file which indicates for each flow:
- the transport port numbers
- the outcome of the DPI analysis (by considering 200B of data for each packet, and signature patterns as provided by l7filter)
- the application name that generated the flow, as returned by GT
Traces were anonimyzed by means of Crypto-PAn as follows:
- up to the transport-level header
- cryptographic prefix-preserving IP address anonymization
- mapping to the same IP address across different traces using the same key
- original address can be recovered with a key
How to get access to the traces
For information about how to get access to the traces, please contact email@example.com.
Please make sure to specify in the email your affiliation, your role
(academic researcher/student) and a brief description of your research
Upon approval you will receive an email with instructions on how to download the data you requested.
The traces were collected on the edge router of the campus network of the University of Brescia on three consecutive working days (09/30, 10/01 and 10/02). They are composed of traffic generated by a set of twenty workstations running the GT client daemon. We collected the traffic by running Tcpdump on the Faculty's router, which is a dual Xeon Linux box that connects our network to the Internet through a dedicated 100Mb/s uplink. Traces were saved to capture files on a dedicated hard disk that is connected to the router internals through a dedicated ATA controller.
We ended up with around 27GB of data, mainly composed of TCP (99%) and UDP traffic, which corresponds to around 79000 flows in total. The traffic includes Web (HTTP and HTTPS), Mail (POP3, IMAP4, SMTP and their SSL variants), Skype, traffic generated by Peer-to-Peer applications, such as BitTorrent and Edonkey, and other protocols (FTP, SSH, and MSN). Details are reported in Table 1. The anonymized and payload-stripped traces occupy around 2.7GB disk space.
|Class of protocols||Flows||Bytes|
We computed the packet loss that affected the capture process by processing the Tcpdump log and by checking all the correctly terminated TCP sessions to discover missing segments. The last test was performed to determine if any other losses, besides the ones logged by Tcpdump, affected the capture process, such as overloads of the FIFO queues used by the DMA data transfer between the network adapter and the kernel. The analysis showed that the worst packet loss reported by Tcpdump was below 1%, and the missing segments from complete TCP traces were (in bytes) below 0.22%, showing negligible losses before the kernel processing.
UNIBS-2009 SSH tunnel
The traces were collected by routing the entire traffic of a given network of the campus of the University of Brescia through an SSH tunnel on three working days (06/16, 06/17 and 06/24). To this purpose, we developed SSHgate, a tool freely available here.
SSHgate allows us to correlate the encrypted sessions with the outer flows: by ascertaining the nature of the outer flow, we povide ground truth to the corresponding encrypted session. For more details, please refer to our SSHgate paper.
We release the traces in an anonimized form and with associated ground truth as provided by GT.