-
Notifications
You must be signed in to change notification settings - Fork 169
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
blog: How Not to Break a Search Engine (#3446)
* blog: How Not to Break a Search Engine or: What I Learned About Unglamorous Engineering * Add description, hero image, copy edits * feedback * tagline * Tweak description, copy edits * Remove fig, make small * no box shadow * max-width:100% diagram for small width screens * ok but like actually remove the shadow * Missing comma Co-authored-by: Rebecca Dodd <rebecca@sourcegraph.com> Co-authored-by: Rebecca Dodd <82806852+rebeccadee@users.noreply.github.com>
- Loading branch information
1 parent
7af61b7
commit e236908
Showing
1 changed file
with
296 additions
and
0 deletions.
There are no files selected for viewing
296 changes: 296 additions & 0 deletions
296
...to-not-break-a-search-engine-or-what-i-learned-about-unglamorous-engineering.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,296 @@ | ||
--- | ||
title: "How not to break a search engine or: What I learned about unglamorous engineering" | ||
description: "When we switched to a new search query parser in September 2020, you'd never know that | ||
anything had changed. This is an account of the rigorous testing that | ||
happened behind the scenes to ensure a seamless transition." | ||
author: Rijnard van Tonder | ||
authorUrl: https://twitter.com/rvtond | ||
publishDate: 2021-06-25T10:00-07:00 | ||
tags: [blog, code, search, software, engineering, testing] | ||
slug: how-not-to-break-a-search-engine-unglamorous-engineering | ||
heroImage: https://sourcegraphstatic.com/blog/how-not-to-break-a-search-engine-unglamorous-engineering.jpg | ||
socialImage: https://sourcegraphstatic.com/blog/how-not-to-break-a-search-engine-unglamorous-engineering.jpg | ||
published: true | ||
--- | ||
|
||
![Unglamorous engineering](https://sourcegraphstatic.com/blog/how-not-to-break-a-search-engine-unglamorous-engineering.jpg) | ||
|
||
>_"In 2020 I flipped the switch to use a completely rewritten parser for | ||
Sourcegraph search queries. It serves tens of thousands of users and processes | ||
millions of queries. And after flipping the switch... nothing remarkable | ||
happened. Nobody noticed. Zero reports of implementation bugs. Everything had | ||
gone as planned. I'd pulled off my greatest feat of unglamorous engineering."_ | ||
|
||
|
||
Early last year I started rewriting the parser that processes search queries in | ||
Sourcegraph—the bit that users type into the search bar. This component | ||
processes every single input that goes into the search bar when users search | ||
for code: | ||
|
||
<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/search-bar.png" style="width: 660px" alt="Code search input bar"> | ||
|
||
When the switch activated the new parser in September 2020, you'd never know | ||
that anything had changed. This is an account of the invisible, rigorous | ||
testing that ensured a seamless transition. | ||
|
||
The motivation for the change was to introduce standard boolean operators like | ||
`and`, `or`, and `not` in queries. Sourcegraph had supported some of these | ||
notions through regular expressions, but we were missing more general | ||
functionality, and the syntax to accompany it. This new functionality meant | ||
rewriting the query parser from scratch (I'm going to gloss over why it had to | ||
be done from scratch, but just bear with me, this wasn't a component we could | ||
retrofit). These boolean operators were a clear product win. I was excited that | ||
a rewrite could later unlock new prospects for our search syntax. I knew these | ||
were the goals. But to get to the goal I had to suspend my focus on the shiny | ||
"feature" part of this work. Something else dwarfed other concerns. | ||
|
||
Sourcegraph's bread and butter is code search. Every single search query is | ||
going to run through this new code. Bugs in a beta feature are understandable. | ||
A UI bug may be temporarily excusable. But a bug in the query processing of the | ||
core product could spell a lot of trouble. What if previously working | ||
queries stopped working? What if they behaved incorrectly? What if an input crashed | ||
the server? The point was to support new operators in the query, and I was | ||
going to introduce these, but mentally I was more preoccupied with ensuring | ||
that _existing queries keep working_ the way that they should. | ||
|
||
Writing a parser isn't _that_ difficult. But it's a different matter to write | ||
one from scratch for a fresh project versus swapping out something that is in | ||
use by thousands of users. When the stakes are high, simple things become | ||
challenging. This code will need checking and checking again. Not exactly | ||
glamorous. | ||
|
||
The changes weren't just about the act of parsing either. The new parser would | ||
produce a different internal representation of a query. Trees would represent | ||
our queries now, no longer just a sequential list of terms. This new | ||
representation needed to keep working with what our backend internals expect. | ||
That, or bugs galore. | ||
|
||
I made up my mind that, as far as I could help it, swapping our parsers wasn't | ||
going to break Sourcegraph. Not on my account. | ||
|
||
## How not to break a search engine | ||
|
||
Over the course of development I progressively tested new changes in four | ||
distinct ways. I wasn't following a checklist of things to do. What mattered | ||
was that I could convince myself that things worked as expected. I wanted | ||
to sleep well at night. Testing along these four different dimensions gave me | ||
confidence that I could expect few (if any) surprises. In general, the sort of | ||
testing for application rewrites or migration is going to depend on the | ||
context. I feel like the steps I took worked well, and they probably generalize | ||
well for migrating other systems that accept user input. | ||
|
||
### Part 1: Unit testing | ||
|
||
Unit testing was the obvious thing to do™ and familiar territory for a lot of | ||
developers, so I won't belabor this part. I thought of this as my frontline | ||
defense for testing correctness. I reused some of our existing parser tests | ||
(which served as a rough specification of things to get right) and also added a | ||
lot of additional tests for new parts of the syntax. You bet there's test | ||
coverage. | ||
|
||
<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/parser-coverage.png" style="width: 300px" alt="Parser file unit test coverage"> | ||
|
||
### Part 2: Integration testing | ||
|
||
Once I wrote a feature-flagged code path to use the new parser, it was time to | ||
test whether Sourcegraph behaved functionally similar. Integration tests ran a | ||
search query through a running instance of the Sourcegraph application and | ||
checked whether search results were expected. This testing phase was so | ||
important because the parser produced a new and different internal | ||
representation. I'd abstracted out a common interface for our backend to access | ||
the new data structure under the feature-flagged code path to test on. | ||
|
||
<figure style="width: 800px; max-width: 100%; margin: auto;"> | ||
<img src="https://storage.googleapis.com/sourcegraph-assets/about.sourcegraph.com/blog/2021/integration.png" alt="Integration testing diagram" style="box-shadow: none;"> | ||
<i><small>Integration testing abstracts a common interface for our backend | ||
to access query values. | ||
We test that the backend produces the same search results for | ||
simple queries (ones that don't have, e.g., </i> | ||
<code>or</code> | ||
<i>-operators), irrespective of whether those values originate from the existing parser's output or the new one.</i></small></a> | ||
</figure> | ||
|
||
<br> | ||
|
||
When I got to this part, we didn't have a good way to run integration tests. We | ||
had browser-based end-to-end testing that was onerous to set up, time-consuming | ||
to run, and brittle. I was desperate for a way to do this more easily and | ||
quickly, and only really needed to run queries through our backend GraphQL API, | ||
not the browser. I wrote a barebones utility to locally run queries through our | ||
backend and snapshot the results | ||
([#10712](https://github.com/sourcegraph/sourcegraph/pull/10712)). This testing | ||
was _immensely_ helpful because I could lock in existing expected search | ||
behaviors for every new addition. I didn't want a user to enter a reasonable | ||
query and get an unreasonable result. It preempted a bunch of behavioral bugs | ||
as I tweaked data structures and interfaces. | ||
|
||
At the time there was a parallel effort to add integration testing to our CI, | ||
partly motivated by the parser rewrite. It wasn't ready yet. Two months after | ||
heavily using my local barebones utility, my coworker | ||
[@joe](https://twitter.com/jc_unknwon) brought this testing to our CI. It was a | ||
game changer. Now all of our development was subject to this testing, not just | ||
my local tinkering. I'd venture that this is probably the most valuable part of | ||
our search testing infrastructure today. The wild thing is how rock solid it | ||
is. We get flakes in our CI all the time, but not from this part. For all its | ||
complexity (set up and tear down of a live Sourcegraph instance, repository | ||
cloning, and running some intensive tests), I can't recall a single time it's | ||
buckled. I'm amazed every time I think about it. I don't know how Joe did it, | ||
but the thing is just peak unglamorous engineering to me. I imagine it was | ||
painful to write and test, but I'm so impressed. The wait was worth it and it's | ||
been catching bugs ever since. | ||
|
||
### Part 3: Fuzz testing | ||
|
||
If unit testing is the appetizer and integration testing the main course, fuzz | ||
testing is the dessert. Fuzz testing would reveal rare, corner-case inputs that | ||
crash the server violently. They’re kind of fun to discover. I used the | ||
excellent [dyukov/go-fuzz](https://github.com/dvyukov/go-fuzz) tool and found | ||
it's a breeze to set up. | ||
|
||
I ran local fuzz jobs for a couple of hours here and there throughout | ||
development. Continuous fuzzing is nice to have, but local fuzzing was good | ||
enough. This part caught three bugs, two of which caused a `panic` that broke an | ||
assertion when concatenating certain patterns with unbalanced parentheses or | ||
unconventional unicode space characters | ||
([#12457](https://github.com/sourcegraph/sourcegraph/pull/12457)). The other was | ||
caused by an out-of-bound access for patterns that ending with a trailing `\` | ||
([#12463](https://github.com/sourcegraph/sourcegraph/pull/12463)). This was on an | ||
experimental feature-flagged code path. No biggie. I was happy to find only | ||
these, and that they were fairly low profile. More than anything, fuzz testing | ||
gave me peace of mind that things were holding up. | ||
|
||
### Part 4: Differential testing | ||
|
||
Differential testing came towards the end of the migration work, when I was | ||
ready to flip the switch for good.<a href="#footnote-1"><sup>1</sup></a> The | ||
new parser had been active under a feature flag on our dogfood instance and | ||
some customer instances, but it was time to make it the default. The point of | ||
no return. For more peace of mind, I wanted assurance that the data structure | ||
output of the new parser was _interpreted_ the same way as the old one on a | ||
larger set of queries than our integration tests covered. I collected a couple | ||
thousand queries from the fuzz testing and live queries on our dogfood | ||
instance. I then ran these through a utility that parses the input with both | ||
new and old parsers, converts the two outputs to a unified data structure that | ||
encodes the query properties, and then diffs the two outputs. Any difference | ||
implied that the query output was interpreted differently by the backend and a | ||
potential bug. | ||
|
||
I caught one good bug with differential testing, where the previous parser ran | ||
a heuristic step that escapes a trailing dangling parenthesis for regular | ||
expressions. The heuristic interpreted an invalid regular expression like | ||
`foo.*(` as `foo.*\(`. This is to avoid throwing a syntax error at the user and | ||
instead yield matches for what they likely intended | ||
([#12733](https://github.com/sourcegraph/sourcegraph/issues/12733)). There were | ||
three other differences that turned out to be fairly inconsequential, but nice | ||
to catch. These bugs were about differing interpretations of reserved syntax. | ||
For example, the new reserved syntax `or` in the new parser had special | ||
meaning. The old parser didn't ascribe any special meaning to `or`, and this | ||
(intentional) difference reflected in the testing. | ||
|
||
## The switch is flipped | ||
|
||
In 2020 I flipped the switch to use the completely rewritten parser for | ||
Sourcegraph search queries. It serves tens of thousands of users and processes | ||
millions of queries. And after flipping the switch... nothing remarkable | ||
happened. Nobody noticed. Zero reports of implementation bugs. Everything had | ||
gone as planned. I'd pulled off my greatest feat of unglamorous engineering. | ||
|
||
Hardly anyone could appreciate how months of effort culminated in an | ||
incident-free transition. That was, after all, expected. I mean, how do you | ||
derive appreciation from users, peers, or managers when the point was for no | ||
one to notice anything significant had changed; when there's no perceptible | ||
delta? | ||
|
||
## Value in the unseen and unspoken | ||
|
||
There's unglamorous engineering in the software all around us. For all its lack | ||
of recognition, I wish we grasped its value a bit better. I'm reminded of a | ||
tweet by a former colleague who researched donations for open source projects: | ||
|
||
<blockquote class="twitter-tweet tw-align-center" data-conversation="none"><p lang="en" dir="ltr">I can tell you from some informal interviews we did outside that paper, that people spend the money on gruntwork — the stuff that’s fun they’re more likely to do anyway, money or not.</p>— Bogdan Vasilescu (@b_vasilescu) <a href="https://twitter.com/b_vasilescu/status/1279199236094132227?ref_src=twsrc%5Etfw">July 3, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> | ||
|
||
This suggests that gruntwork, if not glamorous, is certainly valuable (and | ||
perhaps, even disproportionately so). At the same time, I wouldn't necessarily | ||
call unglamorous engineering thankless. Your close peers may be very | ||
thankful for a component rewrite or code refactor (I know I am), but no amount | ||
of gratitude will make unglamorous work glamorous. It simply has to get done in | ||
spite of whether there's a desire to tackle it, or some incentive or | ||
recognition. And sometimes you might be the only one who knows it. Maybe it's | ||
just part of writing code. At times you need to wear the hat of a hygiene | ||
engineer.<a href="#footnote-2"><sup>2</sup></a> A developer-janitor. A code plumber. And that | ||
cost can feel very personal. | ||
|
||
The lamentable part of the unglamorous engineer's work is that there's little | ||
account of these feats. Technical media and blogs are prone to talk about | ||
features or intellectual explorations. The new and shiny is naturally engaging. | ||
But I also want to hear about that impressive engineering feat that no one | ||
noticed. The one where a developer or team pulled off some tectonic shift in a | ||
codebase, everyone oblivious except themselves. | ||
|
||
## New capabilities while steadily transitioning the old | ||
|
||
Sourcegraph supports more combinations of operators now. And nothing outright | ||
broke in order to get there. In hindsight, did I go overboard on some parts and | ||
would I have done things differently? In short, no. New functionality was | ||
rolled out iteratively and quickly in phases that users could freely try along | ||
the way. I also enabled new implementations on our dogfood instance as things | ||
matured. The core implementation and testing probably took only one or two | ||
months, but crossing the point of no return and removing the fallback was a | ||
slow and thorough process. If I sensed that excessive testing was stunting | ||
progress and delaying the planned timeline, I might feel differently, but I | ||
never got that sense. And to be clear, I did more than rewriting and testing | ||
parsers in that six-month time frame, but that's off topic. Our current state | ||
isn't perfect, there's more to tweak—but when the previous code was finally | ||
dropped, it wasn't one of those typical anxiety-inducing rushes to hit a | ||
deadline. It felt good and it felt right. | ||
|
||
## In closing: Unglamorous engineering and you | ||
|
||
We have a ton of work to do around search query _usability_ (story for another | ||
day). Users are often surprised by how searches behave, even though the | ||
behavior is intentional. But a surprised or mildly inconvenienced user is a far | ||
cry from releasing a bug that takes down a company's instance. Severe bugs | ||
cascade into out-of-band releases for our distribution team and upgrades for | ||
customers. They also tend to have a latent effect on engineer productivity (in | ||
this case, mine) when bugs later impose context-switches to fix | ||
things—conceptually big costs that I wanted to avoid. All of these | ||
considerations, code changes, and heaps of testing happened in the pursuit of | ||
an unglamorous outcome while two, maybe three, engineers reviewed the code to | ||
see it play out. I know this isn't an isolated thing. I get faint hints of | ||
other engineers at Sourcegraph doing momentous but unglamorous things that most | ||
of the organization is blissfully unaware of. And the Twitterverse suggests | ||
there's more of it happening in software all around us: | ||
|
||
<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">A huge problem in software companies is that large new features get praise, promotions, accolades... while migrating off a legacy system, increasing performance 2,4,10X, or reducing error rates, pages, or alerts by X% is often only recognized by peers and not leadership.</p>— Dan Mayer (@danmayer) <a href="https://twitter.com/danmayer/status/1395564252308541440?ref_src=twsrc%5Etfw">May 21, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> | ||
|
||
I empathize with the engineers who don't have an audience for their unglamorous | ||
work, who want to say, "I did A Thing, there's nothing to see, but more people | ||
should care. Let me tell you about it!" I like my portion of showpiece | ||
engineering, don't get me wrong. But shouldn't doing the necessary, unglamorous | ||
work be a marketable skill as well? Where's the signage that reads "Unglamorous | ||
engineers wanted. Will pay handsomely"? I hope you're encouraged to share what | ||
you've pulled off. | ||
|
||
--- | ||
|
||
<a id="footnote-1"><sup>1</sup><small>If you've been following along, here's | ||
where our three-course meal analogy breaks down. Let's just say | ||
differential testing is the surprise food inspector hanging out in the | ||
kitchen.</small></a> | ||
|
||
<a id="footnote-2"><sup>2</sup><small>Amusingly, I once met an engineer from a | ||
well-known tech company who used this title on their business card.</small></a> | ||
|
||
--- | ||
|
||
**About the author** | ||
|
||
_[Rijnard](https://twitter.com/rvtond) is interested in developing new ways to | ||
search, manipulate, and fix code. He holds a PhD in Computer Science from | ||
Carnegie Mellon University, where he researched automated bug fixing. He enjoys | ||
fundamental research but also wants to make research ideas a reality in | ||
practice. That's why he currently works at Sourcegraph, where he applies his | ||
research background to develop new tools and techniques for large-scale code | ||
search and automated refactoring._ | ||
|
||
<small>Acks: Thanks [Rebecca Dodd](/company/team#rebecca-dodd-she-her) and [Camden Cheek](https://twitter.com/camdendcheek) for feedback on the content of this post.</small> |