Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Column() filter and Visitors example #11

Merged
merged 12 commits into from
Jul 3, 2019
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ examples/cat/cat
examples/grep/grep
examples/cat2/cat2
examples/echo/echo
examples/head/head
examples/visitors/visitors
.vscode/settings.json
81 changes: 79 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,55 @@ This is implemented using a type called `ReadAutoCloser`, which takes an `io.Rea

_It is your responsibility to close a pipe if you do not read it to completion_.

## A real-world example

Let's use `script` to write a program which system administrators might actually need. One thing I often find myself doing is counting the most frequent visitors to a website over a given period of time. Given an Apache log in the Common Log Format like this:

```
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
```

we would like to extract the visitor's IP address (the first column in the logfile), and count the number of times this IP address occurs in the file. Finally, we might like to list the top 10 visitors by frequency. In a shell script we might do something like:

```sh
cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head
```

There's a lot going on there, and it's pleasing to find that the equivalent `script` program is quite brief:

```go
package main

import (
"github.com/bitfield/script"
)

func main() {
script.Stdin().Column(1).Freq().First(10).Stdout()
}
```

(Thanks to Lucas Bremgartner for suggesting this example. You can find the complete [program](examples/visitors/main.go), along with a sample [logfile](examples/visitors/access.log), in the [`examples/visitors/`](examples/visitors) directory.)
bitfield marked this conversation as resolved.
Show resolved Hide resolved

## Quick start: Unix equivalents

If you're already familiar with shell scripting and the Unix toolset, here is a rough guide to the equivalent `script` operation for each listed Unix command.

|Unix / shell|`script` equivalent|
|---|---|
|(any program name)|`Exec()`|
|`>`|`WriteFile()`|
bitfield marked this conversation as resolved.
Show resolved Hide resolved
|`>>`|`AppendFile()`|
|`$*`|`Args()`|
|`cat`|`File()` / `Concat()`|
|`cut`|`Column()`|
|`echo`|`Echo()`|
|`grep`|`Match()` / `MatchRegexp()`|
|`grep -v`|`Reject()` / `RejectRegexp()`|
|`head`|`First()`|
|`uniq -c`|`Freq()`|
|`wc -l`|`CountLines()`|

## Sources, filters, and sinks

`script` provides three types of pipe operations: sources, filters, and sinks.
Expand Down Expand Up @@ -267,6 +316,34 @@ fmt.Println(output)

Filters are operations on an existing pipe that also return a pipe, allowing you to chain filters indefinitely.

### Column

`Column()` reads input tabulated by whitespace, and outputs only the Nth column of each input line (like Unix `cut`). Lines containing less than N columns will be ignored.

For example, given this input:

```
PID TT STAT TIME COMMAND
1 ?? Ss 873:17.62 /sbin/launchd
50 ?? Ss 13:18.13 /usr/libexec/UserEventAgent (System)
51 ?? Ss 22:56.75 /usr/sbin/syslogd
```

and this program:

```go
script.Stdin().Column(1).Stdout()
```

this will be the output:

```
PID
1
50
51
```

### Concat

`Concat()` reads a list of filenames from the pipe, one per line, and creates a pipe which concatenates the contents of those files. For example, if you have files `a`, `b`, and `c`:
Expand Down Expand Up @@ -619,8 +696,7 @@ These are some ideas I'm playing with for additional features. If you feel like

### Filters

* `Column()` reads columnar (whitespace-separated) data and cuts the specified column, like Unix `cut`
* `CountFreq()` counts the frequency of input lines, and prepends each unique line with its frequency (like Unix `uniq -c`). The results are sorted in descending numerical order (that is, most frequent lines first).
* [Ideas welcome!](https://github.com/bitfield/script/issues/new)

### Sinks

Expand All @@ -635,6 +711,7 @@ Since `script` is designed to help you write system administration programs, a f
* [grep](examples/grep/main.go)
* [head](examples/head/main.go)
* [echo](examples/echo/main.go)
* [visitors](examples/visitors/main.go)

[More examples would be welcome!](https://github.com/bitfield/script/pulls)

Expand Down
25 changes: 25 additions & 0 deletions examples/visitors/access.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 162544 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 9419 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2058 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 343743 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:16 +0000] "GET / HTTP/1.1" 200 1150 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
212.205.21.11 - - [30/Jun/2019:17:06:16 +0000] "GET / HTTP/1.1" 200 2946 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"
176.182.2.191 - - [30/Jun/2019:17:06:17 +0000] "GET / HTTP/1.1" 200 13278 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 29474 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 29349 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 48271 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 1380 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 91819 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:19 +0000] "GET / HTTP/1.1" 200 305667 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 13194 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 12935 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 14598 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 22458 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 200 15737 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:20 +0000] "GET / HTTP/1.1" 404 17679 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
176.182.2.191 - - [30/Jun/2019:17:06:23 +0000] "GET / HTTP/1.1" 200 5995 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
190.253.121.1 - - [30/Jun/2019:17:06:23 +0000] "GET / HTTP/1.1" 200 8809 "-" "Mozilla/5.0 (Linux; Android 9; SAMSUNG SM-J415FN Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/9.2 Chrome/67.0.3396.87 Mobile Safari/537.36"
176.182.2.191 - - [30/Jun/2019:17:06:24 +0000] "GET / HTTP/1.1" 200 162544 "https://example.com/ "Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/15E148 Safari/604.1"
90.53.111.17 - - [30/Jun/2019:17:06:24 +0000] "GET / HTTP/1.1" 200 - "https://example.com/ "Mozilla/5.0 (Linux; Android 9; SAMSUNG SM-J415FN Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/9.2 Chrome/67.0.3396.87 Mobile Safari/537.36"
5 changes: 5 additions & 0 deletions examples/visitors/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
module visitors

go 1.12

require github.com/bitfield/script v0.9.0
24 changes: 24 additions & 0 deletions examples/visitors/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
This program reads an Apache logfile in Common Log Format, like this:

212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"

It extracts the first column of each line (the visitor IP address), counts the
frequency of each unique IP address in the log, and outputs the 10 most frequent
visitors in the log. Example output:

16 176.182.2.191
7 212.205.21.11
bitfield marked this conversation as resolved.
Show resolved Hide resolved
1 190.253.121.1
1 90.53.111.17

*/
package main

import (
"github.com/bitfield/script"
)

func main() {
script.Stdin().Column(1).Freq().First(10).Stdout()
}
13 changes: 13 additions & 0 deletions filters.go
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,16 @@ func (p *Pipe) Freq() *Pipe {
}
return Echo(output.String())
}

// Column reads from the pipe, and returns a new pipe containing only the Nth
bitfield marked this conversation as resolved.
Show resolved Hide resolved
// column of each line in the input (where columns are delimited by whitespace).
bitfield marked this conversation as resolved.
Show resolved Hide resolved
// If there is an error reading the pipe, the pipe's error status is also set.
func (p *Pipe) Column(col int) *Pipe {
return p.EachLine(func(line string, out *strings.Builder) {
columns := strings.Fields(line)
if col <= len(columns) {
out.WriteString(columns[col-1])
out.WriteByte('\n')
bitfield marked this conversation as resolved.
Show resolved Hide resolved
}
})
}
15 changes: 15 additions & 0 deletions filters_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -270,3 +270,18 @@ func TestFreq(t *testing.T) {
t.Errorf("want %q, got %q", want, got)
}
}

func TestColumn(t *testing.T) {
t.Parallel()
want, err := ioutil.ReadFile("testdata/column.golden.txt")
if err != nil {
t.Fatal(err)
}
got, err := File("testdata/column.input.txt").Column(3).Bytes()
if err != nil {
t.Error(err)
}
if !bytes.Equal(got, want) {
t.Errorf("want %q, got %q", want, got)
}
}
2 changes: 2 additions & 0 deletions pipes_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ func doMethodsOnPipe(t *testing.T, p *Pipe, kind string) {
p.First(1)
action = "Freq()"
p.Freq()
action = "Column()"
p.Column(2)
}

func TestNilPipes(t *testing.T) {
Expand Down
8 changes: 8 additions & 0 deletions testdata/column.golden.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Ss+
R+
Ss
Ss+
Ss+
Ss+
Ss+
Ss
9 changes: 9 additions & 0 deletions testdata/column.input.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
60916 s003 Ss+ 0:00.51 /bin/bash -l
bitfield marked this conversation as resolved.
Show resolved Hide resolved
6653 s004 R+ 0:00.01 ps ax
bogus line
80159 s004 Ss 0:00.56 /bin/bash -l
60942 s006 Ss+ 0:00.53 /bin/bash -l
60943 s007 Ss+ 0:00.51 /bin/bash -l
60977 s009 Ss+ 0:00.52 /bin/bash -l
60978 s010 Ss+ 0:00.53 /bin/bash -l
61356 s011 Ss 0:00.54 /bin/bash -l