We've collected several file system traces for use in our simulations. When we started this work, public file traces that monitored the file-related events we were after with the desired level of detail were not available. Consequently, we modified the SunOS 4.1 kernel to gather our own traces. The traces include every read(), write(), open(), close(), lseek(), dup(), dup2(), exec(), unlink(), and creat() done by every process. Each event is timestamped down to the milli-second and detailed information related to the event (such as the size of the file and the process that issued the event) are also captured and reported. Each trace contains several weeks of activity.
There are two traces that we have used in our simulations; each represents a distinct type of work environment.
The Sitar trace recorded user activity on a publicly available SPARCstation. Most users accessed this workstation remotely via telnet, rlogin, and X-applications. Typical activity included reading mail and news, authoring (editing) papers and memos, and executing common utility programs such as ls, cp, grep, wc, and other standard Unix programs. Mail, news and utility applications dominated this trace. The Sitar trace represents typical non-technical use in an academic environment. We feel that such use (reading mail, authoring documents, etc.) is representative of many corporate, administrative, or government office environments. The Sitar trace spans approximately ten days.
The Harp trace was gathered on a SPARCstation reserved for use by a research project. The trace is dominated by two collaborating programmers working on a large software project. Their work consisted almost entirely of edit/compile/run/debug cycles on a large multimedia application. The trace lasts about seven days and is representative of common programmer activity.
Because these traces are extremely large, we decided not to make them downloadable. Allowing anyone to download these files can really slow our network. We will gladly give ftp access to serious users. Just send email to randy@dcs.uky.edu or griff@dcs.uky.edu to optain a copy.
Characteristics of Two Traces | ||
---|---|---|
Sitar Trace | Harp Trace | |
Ratio of Write to Read | 0.86 | 0.48 |
Ratio of Exec to Read | 0.11 | 0.12 |
Number of Reads | 25,699 | 57,018 |
Total I/O | 396,761KB | 770,641KB |
Total Unique Files | 3,453 | 2,408 |
Avg. App. Read Wait Time | 4.032 ms | 3.463 ms |
<6>TIME 769118084 + 167026<6>OPEN Comm csh Pid 120 Path /etc/termcap FD 0 Size 1
34340 Vnode 332898396
<6>TIME 769118084 + 168091<6>READ Pid 120 FD 0 Len 1024 NewOffset 1024 Vnode 332
898396 Started 769118084 + 167532
<6>TIME 769118084 + 190397<6>CLOSE 0 Pid 120 Vnode 0
<6>TIME 769118100 + 6965<6>EXEC compress Size 24576 Pid 123 Vnode 332832355
<6>TIME 769118100 + 117173<6>MMAP Pid 123 FD 3 Offset 0 Len 40960 Addr -14273740
8 Vnode 332878710
<6>TIME 769118084 + 188983<6>DUP Pid 120 18 2 Vnode 0
<6>TIME 769118117 + 138613<6>LSEEK Vnode 0 NewOffset 0
The following is the key to interpreting the example file log shown above. The 'FD' and 'PID' field are the process ID and the file descriptor. The three numbers after the DUP command are the pid, the old FD, and the new FD. The 'Len' field is in bytes, as is 'Size'. The 'Time' field is in seconds + microseconds format. The 'Vnode' field is a unique id for each file. The second 'Time' field in the READ command shows when the READ returned control to the user application.