HW2: Apache log processing
- Due Jan 27 by 9am
- Points 1
- Submitting a file upload
- Available after Jan 24 at 9:30am
Readings
- "Back to Basics: Sort and Uniq" Links to an external site., Kyle Rankin, 2019
- "cut command in Linux with examples" Links to an external site., Geeks for Geeks, 2024
Remember that you can usually use "command --help" to find out how to use the command, or google "man command" and you should stumble upon the man(uel) page for that command (e.g., here are the man pages for sort Links to an external site., uniq Links to an external site., and cut Links to an external site.).
Exercise
First, clone the course Git repository from GitHub: https://github.com/hafeild/csc440sp25 Links to an external site. —I have some videos on how to use Git, including how to clone a repository, here Links to an external site., but feel free to find your own resources on how to clone a repository.
In the data folder, download the apache logs according to the instructions in the README.
From a terminal running Bash (on Windows, use GitBash or WSL) or zsh (on macoS), answer the two quetions below and take a screenshot of the terminal when you answer each. Paste your screenshots into a Google or Word document and upload that document to this assignment.
- how many total log events there are in across all of the log files?
- do this without using gunzip to decompress the gzipped files (you should use one or more of zcat/zless/etc.)
- what day had the most number of events?
- pipe together zcat, cut, sort, uniq, tail
- you'll need to figure out what flags to pass for each of these commands
When doing these, I'm looking for a good effort, not necessarily the exact right answer. Don't short change yourselves by just trying random things—putting in the time to really give this a good go will pay off in the long run, even if you don't get the exact right answer. You may work on this with others in the class.