Skip to content

Commit

Permalink
Merge branch 'main' into refactor_core_transforms
Browse files Browse the repository at this point in the history
  • Loading branch information
riley-harper committed Aug 27, 2024
2 parents 54b139b + f10b822 commit ea61a97
Show file tree
Hide file tree
Showing 28 changed files with 515 additions and 116 deletions.
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 7c0cbb466d9d372015efeb02dbb55935
config: 39ce5278228b1c0d08a3852a93441384
tags: 645f666f9bcd5a90fca523b33c5a78b7
36 changes: 23 additions & 13 deletions docs/_sources/feature_selection_transforms.md.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,26 @@
# Feature Selection transforms

Each header below represents a feature selection transform. These transforms are used in the context of `feature_selections`.

```
[[feature_selections]]
input_column = "clean_birthyr"
output_column = "replaced_birthyr"
condition = "case when clean_birthyr is null or clean_birthyr == '' then year - age else clean_birthyr end"
transform = "sql_condition"
```

There are some additional attributes available for all transforms: `checkpoint`, `override_column_a`, `override_column_b`, `set_value_column_a`, `set_value_column_b`.
# Feature Selection Transforms

Each feature selection in the `[[feature_selections]]` list must have a
`transform` attribute which tells hlink which transform it uses. The available
feature selection transforms are listed below. The attributes of the feature
selection often vary with the feature selection transform. However, there are a
few utility attributes which are available for all transforms:

- `override_column_a` - Type: `string`. Optional. Given the name of a column in
dataset A, copy that column into the output column instead of computing the
feature selection for dataset A. This does not affect dataset B.
- `override_column_b` - Type: `string`. Optional. Given the name of a column in
dataset B, copy that column into the output column instead of computing the
feature selection for dataset B. This does not affect dataset A.
- `set_value_column_a` - Type: any. Optional. Instead of computing the feature
selection for dataset A, use the given value for every row in the output
column. This does not affect dataset B.
- `set_value_column_b` - Type: any. Optional. Instead of computing the feature
selection for dataset B, use the given value for every row in the output
column. This does not affect dataset A.
- `checkpoint` - Type: `boolean`. Optional. If set to true, checkpoint the
dataset in Spark before computing the feature selection. This can reduce some
resource usage for very complex workflows, but should not be necessary.

## bigrams

Expand Down
2 changes: 1 addition & 1 deletion docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
VERSION: '3.6.0',
VERSION: '3.6.1',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
4 changes: 2 additions & 2 deletions docs/column_mappings.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Column Mappings &#8212; hlink 3.6.0 documentation</title>
<title>Column Mappings &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/comparison_types.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Comparison types, transform add-ons, aggregate features, and household aggregate features &#8212; hlink 3.6.0 documentation</title>
<title>Comparison types, transform add-ons, aggregate features, and household aggregate features &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/config.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Configuration &#8212; hlink 3.6.0 documentation</title>
<title>Configuration &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
37 changes: 25 additions & 12 deletions docs/feature_selection_transforms.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Feature Selection transforms &#8212; hlink 3.6.0 documentation</title>
<title>Feature Selection Transforms &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand All @@ -33,16 +33,29 @@
<div class="body" role="main">

<section id="feature-selection-transforms">
<h1>Feature Selection transforms<a class="headerlink" href="#feature-selection-transforms" title="Link to this heading"></a></h1>
<p>Each header below represents a feature selection transform. These transforms are used in the context of <code class="docutils literal notranslate"><span class="pre">feature_selections</span></code>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="n">feature_selections</span><span class="p">]]</span>
<span class="n">input_column</span> <span class="o">=</span> <span class="s2">&quot;clean_birthyr&quot;</span>
<span class="n">output_column</span> <span class="o">=</span> <span class="s2">&quot;replaced_birthyr&quot;</span>
<span class="n">condition</span> <span class="o">=</span> <span class="s2">&quot;case when clean_birthyr is null or clean_birthyr == &#39;&#39; then year - age else clean_birthyr end&quot;</span>
<span class="n">transform</span> <span class="o">=</span> <span class="s2">&quot;sql_condition&quot;</span>
</pre></div>
</div>
<p>There are some additional attributes available for all transforms: <code class="docutils literal notranslate"><span class="pre">checkpoint</span></code>, <code class="docutils literal notranslate"><span class="pre">override_column_a</span></code>, <code class="docutils literal notranslate"><span class="pre">override_column_b</span></code>, <code class="docutils literal notranslate"><span class="pre">set_value_column_a</span></code>, <code class="docutils literal notranslate"><span class="pre">set_value_column_b</span></code>.</p>
<h1>Feature Selection Transforms<a class="headerlink" href="#feature-selection-transforms" title="Link to this heading"></a></h1>
<p>Each feature selection in the <code class="docutils literal notranslate"><span class="pre">[[feature_selections]]</span></code> list must have a
<code class="docutils literal notranslate"><span class="pre">transform</span></code> attribute which tells hlink which transform it uses. The available
feature selection transforms are listed below. The attributes of the feature
selection often vary with the feature selection transform. However, there are a
few utility attributes which are available for all transforms:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">override_column_a</span></code> - Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. Given the name of a column in
dataset A, copy that column into the output column instead of computing the
feature selection for dataset A. This does not affect dataset B.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">override_column_b</span></code> - Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. Given the name of a column in
dataset B, copy that column into the output column instead of computing the
feature selection for dataset B. This does not affect dataset A.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">set_value_column_a</span></code> - Type: any. Optional. Instead of computing the feature
selection for dataset A, use the given value for every row in the output
column. This does not affect dataset B.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">set_value_column_b</span></code> - Type: any. Optional. Instead of computing the feature
selection for dataset B, use the given value for every row in the output
column. This does not affect dataset A.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">checkpoint</span></code> - Type: <code class="docutils literal notranslate"><span class="pre">boolean</span></code>. Optional. If set to true, checkpoint the
dataset in Spark before computing the feature selection. This can reduce some
resource usage for very complex workflows, but should not be necessary.</p></li>
</ul>
<section id="bigrams">
<h2>bigrams<a class="headerlink" href="#bigrams" title="Link to this heading"></a></h2>
<p>Split the given string column into <a class="reference external" href="https://en.wikipedia.org/wiki/Bigram">bigrams</a>.</p>
Expand Down
4 changes: 2 additions & 2 deletions docs/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index &#8212; hlink 3.6.0 documentation</title>
<title>Index &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="#" />
Expand Down
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Welcome to hlink’s documentation! &#8212; hlink 3.6.0 documentation</title>
<title>Welcome to hlink’s documentation! &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/installation.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Installation &#8212; hlink 3.6.0 documentation</title>
<title>Installation &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/introduction.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Introduction &#8212; hlink 3.6.0 documentation</title>
<title>Introduction &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/link_tasks.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Link Tasks &#8212; hlink 3.6.0 documentation</title>
<title>Link Tasks &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/models.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Models &#8212; hlink 3.6.0 documentation</title>
<title>Models &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
Binary file modified docs/objects.inv
Binary file not shown.
4 changes: 2 additions & 2 deletions docs/pipeline_features.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Pipeline generated features &#8212; hlink 3.6.0 documentation</title>
<title>Pipeline generated features &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/running_the_program.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Running hlink &#8212; hlink 3.6.0 documentation</title>
<title>Running hlink &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Search &#8212; hlink 3.6.0 documentation</title>
<title>Search &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />

<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/searchtools.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/substitutions.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Substitutions &#8212; hlink 3.6.0 documentation</title>
<title>Substitutions &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/use_examples.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Advanced Workflow Examples &#8212; hlink 3.6.0 documentation</title>
<title>Advanced Workflow Examples &#8212; hlink 3.6.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=12dfc556" />
<script src="_static/documentation_options.js?v=5349f462"></script>
<script src="_static/documentation_options.js?v=f731707b"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
Loading

0 comments on commit ea61a97

Please sign in to comment.