Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: How Not to Break a Search Engine #3446

Merged
merged 10 commits into from
Jun 25, 2021
Next Next commit
blog: How Not to Break a Search Engine or: What I Learned About Ungla…
…morous Engineering
  • Loading branch information
rvantonder committed Jun 24, 2021
commit da73e43584f53adc2d469ad1c9108e6f78083fcd
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
---
title: "How Not to Break a Search Engine or: What I Learned About Unglamorous Engineering"
author: Rijnard van Tonder
authorUrl: https://twitter.com/rvtond
publishDate: 2021-06-23T10:00-07:00
tags: [blog, code, search, software, engineering, testing]
slug: how-to-not-break-a-search-engine-unglamorous-engineering
heroImage:
socialImage: https://about.sourcegraph.com/blog/sourcegraph-social-img.png
published: true
---

<p style="padding: 0px 40px 0px 40px; text-align: justify;">“<i>In 2020 I flipped the switch to use the completely rewritten parser for
Sourcegraph search queries. It serves tens of thousands of users and processes
millions of queries. And after flipping the switch... nothing remarkable
happened. Nobody noticed. Zero reports of implementation bugs. Everything had
gone as planned. I'd pulled off my greatest feat of unglamorous engineering.</i>”</p>


Early last year I started rewriting the parser that processes search queries in
Sourcegraph—the bit that users type into the search bar. This component
processes every single input that goes into the search bar when users search
for code:

<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/search-bar.png" style="width: 660px">

When the switch activated the new parser in September 2020, you’d never know that
anything had changed. This is an account of the invisible, rigorous testing
that happened behind the scenes to make this seamless switch possible.

The motivation for the change was to introduce standard boolean operators like
`and`, `or`, and `not` in queries. Sourcegraph had supported some of these
notions through regular expressions, but we were missing more general
functionality, and the syntax to accompany it. This new functionality meant
rewriting the query parser from scratch (I'm going to gloss over why it had to
be done from scratch, but just bear with me, this wasn't a component we could
retrofit). These boolean operators were a clear product win. I was excited that
a rewrite could later unlock new prospects for our search syntax. I knew these
were the goals. But to get to the goal I had to suspend my focus on the shiny
"feature" part of this work. Something else dwarfed other concerns.

Sourcegraph's bread and butter is code search. Every single search query is
going to run through this new code. Bugs in a beta feature are understandable.
A UI bug may be temporarily excusable. But a bug in the query processing of the
core product could spell a lot of very real trouble. What if previously working
queries stop working? What if they behave incorrectly? What if an input crashes
the server? The point was to support new operators in the query, and I was
going to introduce these, but mentally I was more preoccupied with ensuring
that _existing queries keep working_ the way that they should.

Writing a parser isn't _that_ difficult. But it's a different matter to write
one from scratch for a fresh project versus swapping out something that is in
use by thousands of users. When the stakes are high, simple things become
challenging. This code will need checking and checking again. Not exactly
glamorous.

The changes weren't just about the act of parsing either. The new parser would
produce a different internal representation of a query. Trees would represent
our queries now, no longer just a sequential list of terms. This new
representation needed to keep working with what our backend internals expect.
That, or bugs galore.

I made up my mind that, as far as I could help it, swapping our parsers wasn't
going to break Sourcegraph. Not on my account.

## How not to break a search engine

Over the course of development I progressively tested new changes in four
distinct ways. I wasn't following a checklist of things to do. What mattered
was that I could convince myself that things worked as expected. I wanted
to sleep well at night. Testing along these four different dimensions gave me
confidence that I could expect few (if any) surprises. In general, the sort of
testing for application rewrites or migration is going to depend on the
context. I feel like the steps I took worked well, and they probably generalize
well for migrating other systems that accept user input.

### Part 1: Unit testing

Unit testing was the obvious thing to do™ and familiar territory for a lot of
developers, so I won't belabor this part. I thought of this as my front line
defense for testing correctness. I reused some of our existing parser tests
(which served as a rough specification of things to get right) and also added a
lot of additional tests for new parts of the syntax. You bet there's test
coverage.


<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/parser-coverage.png" style="width: 300px">

### Part 2: Integration testing

Once I wrote a feature flagged code path to use the new parser, it was time to
test whether Sourcegraph behaved functionally similar. Integration tests ran a
search query through a running instance of the Sourcegraph application and
checked whether search results were expected. This testing phase was so
important because the parser produced a new and different internal
representation. I'd abstracted out a common interface for our backend to access
the new data structure under the feature flagged code path to test on.

<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/integration.png" style="width: 800px">
rvantonder marked this conversation as resolved.
Show resolved Hide resolved
rvantonder marked this conversation as resolved.
Show resolved Hide resolved

When I got to this part, we didn't have a good way to run integration tests. We
had browser-based end-to-end testing that was onerous to set up, time-consuming
to run, and brittle. I was desperate for a way to do this more easily and
quickly, and only really needed to run queries through our backend GraphQL API,
not the browser. I wrote a barebones utility to locally run queries through our
backend and snapshot the results
[#10712](https://github.com/sourcegraph/sourcegraph/pull/10712). This testing
was _immensely_ helpful because I could lock in existing expected search
behaviors for every new addition. I didn't want a user to enter a reasonable
query and get an unreasonable result. It preempted a bunch of behavioral bugs
as I tweaked data structures and interfaces.

At the time there was a parallel effort to add integration testing to our CI,
partly motivated by the parser rewrite. It wasn't ready yet. Two months after
heavily using my local barebones utility, my coworker
[@joe](https://twitter.com/jc_unknwon) brought this testing to our CI. It was a
game changer. Now all of our development was subject to this testing, not just
my local tinkering. I'd venture that this is probably the most valuable part of
our search testing infrastructure today. The crazy thing is how rock solid it
is. We get flakes in our CI all the time, but not from this part. For all its
complexity (set up and tear down of a live Sourcegraph instance, repository
cloning, and running some intensive tests), I can't recall a single time it's
buckled. I'm amazed every time I think about it. I don't know how Joe did it,
but the thing is just peak unglamorous engineering to me. I imagine it was
painful to write and test. But I'm so impressed. The wait was worth it and it's
been catching bugs ever since.

### Part 3: Fuzz testing

If unit testing is the appetizer and integration testing the main course, fuzz
testing is the dessert. Fuzz testing would reveal rare, corner-case inputs that
crash the server violently. They’re kind of fun to discover. I used the
excellent [dyukov/go-fuzz](https://github.com/dvyukov/go-fuzz) tool and found
it's a breeze to set up.

I ran local fuzz jobs for a couple of hours here and there throughout
development. Continuous fuzzing is nice to have, but local fuzzing was good
enough. This part caught three bugs. Two bugs caused a `panic` that broke an
assertion when concatenating certain patterns with unbalanced parentheses or
unconventional unicode space characters
[#12457](https://github.com/sourcegraph/sourcegraph/pull/12457). The other was
caused by an out of bound access for patterns that ending with a trailing `\`
[#12463](https://github.com/sourcegraph/sourcegraph/pull/12463). This was on an
experimental feature flagged code path. No biggie. I was happy to find only
these, and that they were fairly low profile. More than anything, fuzz testing
gave me peace of mind that things were holding up.

### Part 4: Differential testing
rvantonder marked this conversation as resolved.
Show resolved Hide resolved

Differential testing came towards the end of the migration work, when I was
ready to flip the switch for good.<a href="#footnote-1"><sup>1</sup></a>
The new parser had been active under a
feature flag on our dogfood instance and some customer instances, but it was
time to make it the default. The point of no return. For more peace of mind, I
wanted assurance that the data structure output of the new parser was
_interpreted_ the same way as the old one on a larger set of queries
than our integration tests covered. I collected a couple thousand queries from
the fuzz testing and live queries on our dogfood instance. I then ran these
through a utility that parses the input with both new and old parsers, converts
the two outputs to a unified data structure that encodes the query properties,
and then diffs the two outputs. Any difference implied that the query output
was interpreted differently by the backend and a potential bug.

I caught one good bug with differential testing, where the previous parser ran
a heuristic step that escapes a trailing dangling parentheses for regular
expressions. The heuristic interpreted an invalid regular expression like
`foo.*(` as `foo.*\(`. This to avoid throwing a syntax error at the user and
instead yield matches for what they likely intended
[#12733](https://github.com/sourcegraph/sourcegraph/issues/12733). There were
three other differences that turned out to be fairly inconsequential, but nice
to catch. These bugs were about differing interpretations of reserved syntax.
For example, the new reserved syntax `or` in the new parser had special
meaning. The old parser didn't ascribe any special meaning to `or`, and this
(intentional) difference reflected in the testing.

## The switch is flipped

In 2020 I flipped the switch to use the completely rewritten parser for
Sourcegraph search queries. It serves tens of thousands of users and processes
millions of queries. And after flipping the switch... nothing remarkable
happened. Nobody noticed. Zero reports of implementation bugs. Everything had
gone as planned. I'd pulled off my greatest feat of unglamorous engineering.

Hardly anyone could appreciate how months of effort culminated in an
incident-free transition. That was, after all, expected. I mean, how do you
derive appreciation from users, peers, or managers when the point was for no
one to notice anything significant had changed... when there's no perceptible
delta?

## Value in the unseen and unspoken

There's unglamorous engineering activity in the software all around us. For all
its lack of recognition, I wish we grasped its value a bit better. I'm reminded
of a tweet by a former colleague who researched donations for open source
projects:

<p style="padding: 0px 40px 0px 40px; text-align: justify;">“<i>From some informal interviews [we learned] people spend money on gruntwork—the stuff that's fun they're more likely to do anyway, money or not. </i>” <a href="https://twitter.com/b_vasilescu/status/1279199236094132227"><b>↗</b></a></p>

This suggests that gruntwork, if not glamorous, is certainly valuable (and
perhaps, even disproportionately so). At the same time, I wouldn't necessarily
call unglamorous engineering thankless. Your close peers may be very
thankful for a component rewrite or code refactor (I know I am), but no amount
of gratitude will make unglamorous work glamorous. It simply has to get done in
spite of whether there's a desire to tackle it, or some incentive or
recognition. And sometimes you might be the only one who knows it. Maybe it's
just part of writing code. At times you need to wear the hat of a hygiene
engineer.<a href="#footnote-2"><sup>2</sup></a> A developer-janitor. A code plumber. And that
cost can feel very personal.

The lamentable part of the unglamorous engineer's work is that there's little
account of these feats. Technical media and blogs are prone to talk about
features or intellectual explorations. The new and shiny is naturally engaging.
But I also want to hear about that impressive engineering feat that no one
noticed. The one where a developer or team pulled off some tectonic shift in a
codebase, everyone oblivious except themselves.

## New capabilities while steadily transitioning the old

Sourcegraph supports more combinations of operators now. And nothing outright
broke in order to get there. And to be clear, I did more than rewriting and
testing parsers in that six-month time frame. There were frequent updates in
releases that customers could enable along the way, with a safe fallback to
disable new functionality. I also enabled new implementations on our dogfood
instance as things matured. The core implementation probably took only two or
three months, but crossing the point of no return and removing the fallback was
a slow and thorough process. There's more to tweak, but when the previous code
was finally dropped, it felt good and it felt right.

## Developer tools reduce unglamorous effort
rvantonder marked this conversation as resolved.
Show resolved Hide resolved

While building the new parser I ran countless searches and constantly jumped
around the codebase to figure out how things interacted. It made me realize
that developer tools like code search is really just empowering me to do
unglamorous engineering more effectively. That's a very, very good thing. Of
course tools like code search aren't exclusive to humdrum work, but I wonder
how much value developers ultimately derive from tools that reduce unglamorous
effort. I bet it's substantial. Refactoring and rewriting, searching for
dependency versions and updating them, this _is_ the stuff of unglamorous
engineering. But maybe because the nature of the work is so unseen, so is the
value behind the developer tools that power it.

## In closing: Unglamorous engineering and you
rvantonder marked this conversation as resolved.
Show resolved Hide resolved

We have a ton of work to do around search query _usability_ (story for another
day). Users are often surprised by how searches behave, even though the
behavior is intentional. But a surprised or mildly inconvenienced user is a far
cry from releasing a bug that takes down a company's instance. Severe bugs
cascade into out-of-band releases for our distribution team and upgrades for
customers. They also tend to have a latent effect on engineer productivity (in
this case, mine) when bugs later impose context-switches to fix things.
Conceptually big costs that I wanted to avoid. All of these considerations,
code changes, and heaps of testing happened in the pursuit of an unglamorous
outcome while two, maybe three, engineers reviewed the code to see it play out.
I know this isn't an isolated thing. I get faint hints of other engineers at
Sourcegraph doing momentous but unglamorous things that most of the
organization is blissfully unaware of. And the Twitterverse suggests there's
more of it happening in software all around us:

<p style="padding: 0px 40px 0px 40px; text-align: justify;">“<i>
A huge problem in software companies is that large new features get praise, promotions, accolades... while migrating off a legacy system, increasing performance ... or reducing error rates ... is often only recognized by peers...</i>” <a href="https://twitter.com/danmayer/status/139556425230854144"><b>↗</b></a></p>

I empathize with the engineers who don't have an audience for their unglamorous
work, who want to say "I did A Thing, there's nothing to see, but more people
rebeccadee marked this conversation as resolved.
Show resolved Hide resolved
should care. Let me tell you about it!". I like my portion of showpiece
engineering, don't get me wrong. But shouldn't doing the necessary, unglamorous
work be a marketable skill as well? Where's the signage that reads "Unglamorous
engineers wanted. Will pay handsomely."? I hope you're encouraged to share what
you've pulled off.
rvantonder marked this conversation as resolved.
Show resolved Hide resolved

---

<a id="footnote-1"><sup>1</sup><small>If you've been following along, here's
where our three-course meal analogy breaks down. Let's just say
differential testing is the surprise food inspector hanging out in the
kitchen.</small></a>

<a id="footnote-2"><sup>2</sup><small>Amusingly, I once met an engineer from a
well-known tech company who used this title on their business card.</small></a>

---

**About the author**

[Rijnard](https://twitter.com/rvtond) is interested in developing new ways to search, manipulate, and fix code.
He holds a PhD in Computer Science from Carnegie Mellon University, where he researched automated bug fixing. He enjoys fundamental research but also wants to make research ideas a reality in practice. That's why he currently works at Sourcegraph, where he applies his research background to develop new tools and techniques for large scale code search and automated refactoring.

<small>Acks: Thanks [Rebecca Dodd](https://www.linkedin.com/in/rebecca-lee-dodd/) for feedback on the content of this post.</small>