Skip to content

Commit

Permalink
feat: timeline mask (#293)
Browse files Browse the repository at this point in the history
* feat: wip! add axis package for timeline generation

* style: lint

* feat(timeline): wip! update masking model

* feat(timeline): wip! activate mask

* feat(timeline): wip! generate valid dates

* style: lint

* feat(timeline): add constraints

* feat(timeline): update json schema

* chore: update docker-compose commands

* feat(timeline): add error handling

* feat(timeline): set max retry + onError reject or nullify

* docs(timeline): update changelog

* feat(timeline): default value

* feat(timeline): fix unit test

* test(timeline): add venom tests

* docs(timeline): update readme

* docs(timeline): fix readme

* fix(timeline): nil pointer exception

* fix(timeline): nil pointer exception

* docs(timeline): fix bullet alignment
  • Loading branch information
adrienaury authored Apr 5, 2024
1 parent 7206233 commit 1b6d7f6
Show file tree
Hide file tree
Showing 12 changed files with 977 additions and 13 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ jobs:
fetch-depth: 0

- name: Build
run: docker-compose -f .devcontainer/docker-compose.yml build
run: docker compose -f .devcontainer/docker-compose.yml build

- name: Start services
run: docker-compose -f .devcontainer/docker-compose.yml up -d vscode
run: docker compose -f .devcontainer/docker-compose.yml up -d vscode

- name: Init env
run: docker-compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode make init
run: docker compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode make init

- uses: FranzDiebold/github-env-vars-action@v2.4.0
- name: Run CI # up to test-int (info → refresh → lint → test → release → test-int)
run: |
docker-compose -f .devcontainer/docker-compose.yml exec \
docker compose -f .devcontainer/docker-compose.yml exec \
-T \
-u root \
-w /workspace \
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,33 +26,33 @@ jobs:
fetch-depth: 0

- name: Build
run: docker-compose -f .devcontainer/docker-compose.yml build
run: docker compose -f .devcontainer/docker-compose.yml build

- name: Start services
run: docker-compose -f .devcontainer/docker-compose.yml up -d vscode
run: docker compose -f .devcontainer/docker-compose.yml up -d vscode

- name: Init env
run: docker-compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode make init
run: docker compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode make init

- uses: FranzDiebold/github-env-vars-action@v2
- name: Release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }}
run: |
docker-compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode git reset --hard
docker-compose -f .devcontainer/docker-compose.yml exec -e GITHUB_TOKEN=${GITHUB_TOKEN} -T -u root vscode bash -c 'echo "GITHUB_TOKEN: ${GITHUB_TOKEN}" > ~/.github.yml'
docker-compose -f .devcontainer/docker-compose.yml exec -T -u root vscode bash -c 'echo "DOCKERHUB_USER: cgibot" > ~/.dockerhub.yml'
docker-compose -f .devcontainer/docker-compose.yml exec -e DOCKERHUB_PASS=${DOCKERHUB_PASS} -T -u root vscode bash -c 'echo "DOCKERHUB_PASS: ${DOCKERHUB_PASS}" >> ~/.dockerhub.yml'
docker-compose -f .devcontainer/docker-compose.yml exec \
docker compose -f .devcontainer/docker-compose.yml exec -T -u root -w /workspace vscode git reset --hard
docker compose -f .devcontainer/docker-compose.yml exec -e GITHUB_TOKEN=${GITHUB_TOKEN} -T -u root vscode bash -c 'echo "GITHUB_TOKEN: ${GITHUB_TOKEN}" > ~/.github.yml'
docker compose -f .devcontainer/docker-compose.yml exec -T -u root vscode bash -c 'echo "DOCKERHUB_USER: cgibot" > ~/.dockerhub.yml'
docker compose -f .devcontainer/docker-compose.yml exec -e DOCKERHUB_PASS=${DOCKERHUB_PASS} -T -u root vscode bash -c 'echo "DOCKERHUB_PASS: ${DOCKERHUB_PASS}" >> ~/.dockerhub.yml'
docker compose -f .devcontainer/docker-compose.yml exec \
-T \
-u root \
-w /workspace \
-e PATH=/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin:/home/vscode/go/bin:/workspace/bin \
vscode \
neon -props "{tag: ${CI_ACTION_REF_NAME}, MODULE: github.com/${CI_REPOSITORY,,}, BY: ${CI_ACTOR}, dockerfiles: {'Dockerfile': '.'}, latest: true}" publish docker-push build-web-wasm
- name: Test version
run: docker-compose -f .devcontainer/docker-compose.yml exec -T -u root vscode /workspace/bin/dist/cmd/pimo_linux_amd64_v1/pimo --version
run: docker compose -f .devcontainer/docker-compose.yml exec -T -u root vscode /workspace/bin/dist/cmd/pimo_linux_amd64_v1/pimo --version
- name: Push pimo play to github pages repository
uses: cpina/github-action-push-to-another-repository@v1.7.2
env:
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Types of changes
- `Fixed` for any bug fixes.
- `Security` in case of vulnerabilities.

## [1.23.0]

- `Added` mask `timeline` to generate timelines from rules and constraints between dates

## [1.22.2]

- `Fixed` load data for unique cache
Expand Down
117 changes: 117 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@ PIMO is a tool for data masking. It can mask data from a JSONline stream and ret
You can use [LINO](https://github.com/CGI-FR/LINO) to extract sample data from a database, which you can then use as input data for PIMO's data masking.
You can also generate data with a simple yaml configuration file.

**Capabilities**
- credibility : generated data is not distinguishable from real data
- data synthesis : generate data from nothing
- data masking, including
- randomization : protect personal or sensitive data by writing over it
- pseudonymization, on 3 levels
- consistent pseudonymisation : real value A is always replaced by pseudo-value X but X can be attributed to other values than A
- identifiant pseudonymisation : real value A is always replaced by pseudo-value X and X *CANNOT* be attributed to other values than A
- reversible pseudonymisation : real value A can be generated from pseudo-value X

## Configuration file needed

PIMO requires a yaml configuration file to works. By default, the file is named `masking.yml` and is placed in the working directory. The file must respect the following format :
Expand Down Expand Up @@ -122,6 +132,7 @@ The following types of masks can be used :
* [`randomChoiceInUri`](#randomchoiceinuri) is to mask with a random value from an external resource.
* [`randomChoiceInCSV`](#randomchoiceincsv) is to mask with a random value from an external CSV resource.
* [`transcode`](#transcode) is to mask a value randomly with character class preservation.
* [`timeline`](#timeline) to generate a set of dates related to each other (by rules and constraints)
* K-Anonymization
* [`range`](#range) is to mask a integer value by a range of value (e.g. replace `5` by `[0,10]`).
* [`duration`](#duration) is to mask a date by adding or removing a certain number of days.
Expand Down Expand Up @@ -1032,6 +1043,112 @@ Here is the result of excution:

[Return to list of masks](#possible-masks)

### Timeline

[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQEQgCbIAsATJgLYCGEA1lHAOYKZJIC0SOANiAMYAuMMExYikAKwjwADhT4ALZCj5QyITnRBZRlGhGGi2SCngKotB9stXq4IfQZEQ+FMH3sORcCqsVOXfFCQAYiQvVSR5ECQAMyhIPiQpGDoEugi5KKs1DWYPUWAKTgBXO1RiAAZygDZWcrRa4gAVNABWBHLCdpaALUCQmClleEKkZB4isDAQOAS8WSioaNCYBIgpXkWofFy86MFKNzLKmrqGvqQYIr4pK5j92TuwdMyVbNsdjySUvQ+89jDSigAEZxeTmPLaOiKVgABQAHOUAJrnTgwADu4CQQMucDwj3SUAgSDmfCiAEkAMoAeSQcKqdWJE1ksDgvwhlAAHtCYWg4cjgkgilJ1k9sUVcfj5ITifMkJSaXSGXgmUNWRCRP9vICePA+GAKPxweqYmAYGRFCDXHJzmLcfgkFNOMzQBEYKhLWCkAAKRbLVbrHibfAAGmecAdamdmTdflciWSMwAlGy8mQoagANQ8vlG9WcxRZwhI3MQnVwJz677uY2GIEgPZTXzOVwlhyanyoKSmsgrFmtjzRbuKMt6g0BFMeNOITMw8r9hz5mctZGYIA&i=N4XyA)

This mask can generate multiple dates related to each other, for example :

```yaml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "timeline"
masks:
- add: ""
- timeline:
start:
name: "start" # name the first point in the timeline
value: "2006-01-02T15:04:05Z" # optional : current date if not specified
format: "2006-01-02" # output format for the timeline
points:
- name: "birth"
min: "-P80Y" # lower bound for this date ISO 8601 duration
max: "-P18Y" # upper bound for this date ISO 8601 duration
- name: "contract"
from: "birth" # bounded relative to "birth" (if not specified, then relative to start point)
min: "+P18Y"
max: "+P40Y"
- name: "promotion"
from: "contract"
min: "+P0"
max: "+P5Y"
```

Will generate :

```console
$ pimo --empty-input
{"timeline":{"start":"2006-01-02","birth":"1980-12-01","contract":"2010-07-16","promotion":"2010-12-06"}}
```

#### Constraints

`before` and `after` constraints can be set to create better timelines, for example :

```yaml
- name: "begin"
min: "P0"
max: "+P80Y"
- name: "end"
min: "P0"
max: "+P80Y"
constraints:
- before: "begin"
```

The dates `begin` and `end` will both be chosen from the same interval, but `end` will always be after `begin`.

To enforce this, the timeline mask will regerate all date until all constraints are met, up to 200 retries. If there is still unsatified contraints after 200 attempts, the mask will set the date to `null`.

This default behavior can be changed with the following parameters :

- `retry` sets the maximum number of retry (it can be set to `0` to disable retrying)

```yaml
- timeline:
start:
name: "start"
value: "2006-01-02T15:04:05Z"
format: "2006-01-02"
retry: 0 # constraints will fail immediatly if not satisfied
```

- `onError` will change the default behavior that set date to `null` if contraints cannot be satified, following values are accepted :
- `default` : use a default value, this is the standard behavior when `onError` is unset (see next item for how to change the default value)
- `reject` : fail masking of the current line with an error

`onError` is defined on each constraint, for example :

```yaml
- name: "begin"
min: "P0"
max: "+P80Y"
- name: "end"
min: "P0"
max: "+P80Y"
constraints:
- before: "begin"
onError: "reject"
```

- `default` set the default value to use when an error occurs, if not set `null` value is the default

```yaml
- name: "begin"
min: "P0"
max: "+P80Y"
- name: "end"
min: "P0"
max: "+P80Y"
constraints:
- before: "begin"
default: "begin" # use begin date if constraint can't be satisfied
```

[Return to list of masks](#possible-masks)

### XML

[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](http://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFMkkBaJCEAGxAGMAXGMcyrpAKwngAOuFgAtkKJvBYg4LLNzyFO3JAA9s9ZSrVDR4uDGnztSAMRJRIJELAyWSGACMezewApcSACYw8xAJRIAO5Q9PRIjla4TEwgENCOjMFQohYiVigA+ihIwLhgULiJVnC42CAAdBQmxC6sAAr5duLZ1dqKRKRaKjR0jKzs3Sa8-HC6YqgAArgArqLsxsMdQybmACowSDN0SJ42drm49DNWxBYg2AL0wiDIAN53FUgA1Ei1ro22sgByZVavT1efHgAGkQABPAC+kLaw3Ol2u0nEDwqmQqLBSjGhix6tAYrkGsOGwLGwgmKC8Nxx7XwBAgK20NDAuDgXgAIjcGXDKdIALLEcRoACcAHYAAxUMVoSVoNZisUIeWKsUALWpcMoPJAvNwqnEACYxYaZTK5QqlfK1USTDRpAjOagHsQvCBVEgngAdCk3L3YoA&i=N4KABGBEAuCW0BsCmkBcUC2BPMAjBA9gOZgB2B0KANOFAMYGmVNpQA85lYAhgK7QALAgCcAvAHIAUgQGkwAEQJJxAPjYATbpRUBGAAwB6fQYBMekwGY2BzdoAqA2AGcwznmQpIwBAGZhseIREAHShwdacSCqQIAC+QA)
Expand Down
2 changes: 2 additions & 0 deletions internal/app/pimo/pimo.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ import (
"github.com/cgi-fr/pimo/pkg/template"
"github.com/cgi-fr/pimo/pkg/templateeach"
"github.com/cgi-fr/pimo/pkg/templatemask"
"github.com/cgi-fr/pimo/pkg/timeline"
"github.com/cgi-fr/pimo/pkg/transcode"
"github.com/cgi-fr/pimo/pkg/weightedchoice"
"github.com/cgi-fr/pimo/pkg/xml"
Expand Down Expand Up @@ -301,6 +302,7 @@ func injectMaskFactories() []model.MaskFactory {
hashcsv.Factory,
findincsv.Factory,
xml.Factory,
timeline.Factory,
}
}

Expand Down
87 changes: 87 additions & 0 deletions pkg/axis/constraint.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
// Copyright (C) 2024 CGI France
//
// This file is part of PIMO.
//
// PIMO is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// PIMO is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with PIMO. If not, see <http://www.gnu.org/licenses/>.

package axis

type ConstraintBehavior int

const (
Reject ConstraintBehavior = iota
Nullify
Replace
// Default
)

type Constraint interface {
Validate(value int64, points map[string]*int64) bool
Behavior() ConstraintBehavior
}

type constraint struct {
behavior ConstraintBehavior
reference string
}

func (c constraint) Behavior() ConstraintBehavior {
return c.behavior
}

func LowerThan(reference string, behavior ConstraintBehavior) Constraint {
return lowerThan{
constraint: constraint{
behavior: behavior,
reference: reference,
},
}
}

type lowerThan struct {
constraint
}

func (lt lowerThan) Validate(value int64, points map[string]*int64) bool {
ref := points[lt.reference]

if ref == nil {
return true
}

return value < *ref
}

func GreaterThan(reference string, behavior ConstraintBehavior) Constraint {
return greaterThan{
constraint: constraint{
behavior: behavior,
reference: reference,
},
}
}

type greaterThan struct {
constraint
}

func (gt greaterThan) Validate(value int64, points map[string]*int64) bool {
ref := points[gt.reference]

if ref == nil {
return true
}

return value > *ref
}
Loading

0 comments on commit 1b6d7f6

Please sign in to comment.