Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat xml support #270

Merged
merged 17 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Types of changes
- `Fixed` for any bug fixes.
- `Security` in case of vulnerabilities.

## [1.20.0]

- `Added` new features for parsing and masking XML files using the following command: `cat XMLfile | pimo xml --subscriber <parent tag name>=<mask name> > outputXMLfile`. This feature supports all level 1 elements that are not arrays.

## [1.19.0]

- `Added` new features for ff1 mask : `domain`, `preserve` and `onError`.
Expand Down
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,128 @@ By default, if not specified otherwise, these classes will be used (input -> out

[Return to list of masks](#possible-masks)


### Parsing XML files

To use PIMO to masking data in an XML file, use in the following way :

```bash
`cat data.xml | pimo xml --subscriber parentTagName=MaskName.yml > maskedData.xml`
```

Pimo selects specific tags within a predefined parent tag to replace the text and store the entire data in a new XML file. These specific tags should not contain any other nested tags.

To mask values of attributes, follow the rules to define your choice in jsonpath in masking.yml.

* For attributes of parent tag, we use: `@attributeName` in jsonpath.
* For attributes of child tag, we use: `childTagName@attributeName` in jsonpath.

For example, consider an XML file named data.xml:

**`data.xml`**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<taxes>
<agency>
<name>NewYork Agency</name>
<agency_number>0032</agency_number>
</agency>
<account type="classic">
<name age="25">Doe</name>
<account_number>12345</account_number>
<annual_income>50000</annual_income>
</account>
<account type="saving">
<name age="50">Smith</name>
<account_number>67890</account_number>
<annual_income>60000</annual_income>
</account>
</taxes>
```

In this example, you can mask the values of `agency_number` in the `agency` tag and the values of `name` and `account_number` in the `account` tag using the following command:

```bash
`cat data.xml | pimo xml --subscriber agency=masking_agency.yml --subscriber account=masking_account.yml > maskedData.xml`
```

**`masking_agency.yml`**

```yaml
version: "1"
seed: 42

masking:
- selector:
jsonpath: "agency_number" # this is the name of tag that will be masked
mask:
template: '{{MaskRegex "[0-9]{4}$"}}'
```

**`masking_account.yml`**

```yaml
version: "1"
seed: 42

masking:
- selector:
jsonpath: "name" # this is the name of tag that will be masked
mask:
randomChoiceInUri: "pimo://nameFR"
- selector:
jsonpath: "@type" # this is the name of parent tag's attribute that will be masked
mask:
randomChoice:
- "classic"
- "saving"
- "securitie"
- selector:
jsonpath: "account_number" # this is the name of tag that will be masked
masks:
- incremental:
start: 1
increment: 1
# incremental will change string to int, need to use template to restore string value in xml file
- template: "{{.account_number}}"
- selector:
jsonpath: "name@age" # this is the name of child tag's attribute that will be masked
masks:
- randomInt:
min: 18
max: 95
# @ is not accepted by GO, so there we need use index in template to change int into string
- template: "{{index . \"name@age\"}}"
```

After executing the command with the correct configuration, here is the expected result in the file maskedData.xml:

**`maskedData.xml`**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<taxes>
<agency>
<name>NewYork Agency</name>
<agency_number>2308</agency_number>
</agency>
<account type="saving">
<name age="33">Rolande</name>
<account_number>1</account_number>
<annual_income>50000</annual_income>
</account>
<account type="saving">
<name age="47">Matéo</name>
<account_number>2</account_number>
<annual_income>60000</annual_income>
</account>
</taxes>
```

[Return to list of masks](#possible-masks)


## `pimo://` scheme

Pimo embed a usefule list of fake data. URIs that begin with a pimo:// sheme point to the pseudo files bellow.
Expand Down
55 changes: 55 additions & 0 deletions cmd/pimo/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ var (
statsTemplate string
statsDestinationEnv = os.Getenv("PIMO_STATS_URL")
statsTemplateEnv = os.Getenv("PIMO_STATS_TEMPLATE")
xmlSubscriberName map[string]string
)

func main() {
Expand Down Expand Up @@ -119,6 +120,60 @@ There is NO WARRANTY, to the extent permitted by law.`, version, commit, buildDa
fmt.Println(jsonschema)
},
})
// Add command for XML transformer
xmlCmd := &cobra.Command{
Use: "xml",
Short: "Parsing and masking XML file",
Run: func(cmd *cobra.Command, args []string) {
initLog()
if len(catchErrors) > 0 {
skipLineOnError = true
skipLogFile = catchErrors
}
config := pimo.Config{
EmptyInput: emptyInput,
RepeatUntil: repeatUntil,
RepeatWhile: repeatWhile,
Iteration: iteration,
SkipLineOnError: skipLineOnError,
SkipFieldOnError: skipFieldOnError,
SkipLogFile: skipLogFile,
CachesToDump: cachesToDump,
CachesToLoad: cachesToLoad,
XMLCallback: true,
}

parser := pimo.ParseXML(cmd.InOrStdin(), cmd.OutOrStdout())
// Map the command line balise name to fit the masking configuration
for elementName, mask := range xmlSubscriberName {
pdef, err := model.LoadPipelineDefinitionFromFile(mask)
if err != nil {
fmt.Printf("Error when charging pipeline for %s : %v\n", elementName, err)
return
}
ctx := pimo.NewContext(pdef)
if err := ctx.Configure(config); err != nil {
log.Err(err).Msg("Cannot configure pipeline")
log.Warn().Int("return", 1).Msg("End PIMO")
os.Exit(1)
}

parser.RegisterMapCallback(elementName, func(m map[string]string) (map[string]string, error) {
transformedData, err := ctx.ExecuteMap(m)
if err != nil {
return nil, err
}
return transformedData, nil
})
}
err := parser.Stream()
if err != nil {
log.Err(err).Msg("Error during parsing XML document")
}
},
}
xmlCmd.Flags().StringToStringVar(&xmlSubscriberName, "subscriber", map[string]string{}, "name of element to mask")
rootCmd.AddCommand(xmlCmd)

rootCmd.AddCommand(&cobra.Command{
Use: "flow",
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/cgi-fr/pimo
go 1.20

require (
github.com/CGI-FR/xixo v0.1.6
github.com/Masterminds/sprig/v3 v3.2.3
github.com/adrienaury/zeromdc v0.0.0-20221116212822-6a366c26ee61
github.com/capitalone/fpe v1.2.1
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
github.com/CGI-FR/xixo v0.1.6 h1:C3BPzLmUebjXsQqaP8A6IBwtlqpRX6Pq9xd3PQp6DCg=
github.com/CGI-FR/xixo v0.1.6/go.mod h1:Q7Xf6CHqoU6hyRwPtvrUu4wCspfFYxIWZoYXTYXvtI8=
github.com/Masterminds/goutils v1.1.1 h1:5nUrii3FMTL5diU80unEVvNevw1nH4+ZV4DSLVJLSYI=
github.com/Masterminds/goutils v1.1.1/go.mod h1:8cTjp+g8YejhMuvIA5y2vz3BpJxksy863GQaJW2MFNU=
github.com/Masterminds/semver/v3 v3.2.0 h1:3MEsd0SM6jqZojhjLWWeBY+Kcjy9i6MQAeY7YgDP83g=
Expand Down
41 changes: 41 additions & 0 deletions internal/app/pimo/pimo.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ type Config struct {
SkipLogFile string
CachesToDump map[string]string
CachesToLoad map[string]string
XMLCallback bool
}

type Context struct {
Expand Down Expand Up @@ -107,6 +108,9 @@ func (ctx *Context) Configure(cfg Config) error {

over.AddGlobalFields("context")
switch {
case cfg.XMLCallback:
over.MDC().Set("context", "callback-input")
ctx.source = model.NewCallableMapSource()
case cfg.EmptyInput:
over.MDC().Set("context", "empty-input")
ctx.source = model.NewSourceFromSlice([]model.Dictionary{model.NewPackedDictionary()})
Expand Down Expand Up @@ -331,3 +335,40 @@ func updateContext(counter int) {
context := over.MDC().GetString("context")
over.MDC().Set("context", re.ReplaceAllString(context, fmt.Sprintf("[%d]", counter)))
}

func (ctx *Context) ExecuteMap(data map[string]string) (map[string]string, error) {
input := model.NewDictionary()

for k, v := range data {
input = input.With(k, v)
}
source, ok := ctx.source.(*model.CallableMapSource)
if !ok {
return nil, fmt.Errorf("Source is not CallableMapSource")
}
source.SetValue(input)
result := []model.Entry{}
err := ctx.pipeline.AddSink(model.NewSinkToSlice(&result)).Run()
if err != nil {
return nil, err
}

newData := make(map[string]string)

if len(result) > 0 {
new_map, ok := result[0].(model.Dictionary)
if !ok {
return nil, fmt.Errorf("result is not Dictionary")
}
unordered := new_map.Unordered()
for k, v := range unordered {
stringValue, ok := v.(string)
if !ok {
return nil, fmt.Errorf("Result is not a string")
}
newData[k] = stringValue
}
return newData, nil
}
return nil, fmt.Errorf("Result is not a map[string]string")
}
66 changes: 66 additions & 0 deletions internal/app/pimo/pimo_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,3 +318,69 @@ func LoadJsonLineFromDocument(filename string) (model.Dictionary, error) {

// return jsonline.JSONToDictionary(compactLine.Bytes())
}

func Test2BaliseIdentity(t *testing.T) {
definition := model.Definition{
Version: "1",
Seed: 42,
Masking: []model.Masking{
{
Selector: model.SelectorType{Jsonpath: "name"},
Mask: model.MaskType{
RandomChoiceInURI: "pimo://nameFR",
},
},
},
}
ctx := pimo.NewContext(definition)
cfg := pimo.Config{
Iteration: 1,
XMLCallback: true,
}

err := ctx.Configure(cfg)
assert.Nil(t, err)

data := map[string]string{"name": "John"}
newData1, err := ctx.ExecuteMap(data)
assert.Nil(t, err)
newData2, err := ctx.ExecuteMap(data)
assert.Nil(t, err)
assert.NotEqual(t, newData2["name"], newData1["name"])
}

func TestExecuteMapWithAttributes(t *testing.T) {
definition := model.Definition{
Version: "1",
Masking: []model.Masking{
{
Selector: model.SelectorType{Jsonpath: "name"},
Mask: model.MaskType{
HashInURI: "pimo://nameFR",
},
},
{
Selector: model.SelectorType{Jsonpath: "name@age"},
Mask: model.MaskType{
Regex: "([0-9]){2}",
},
},
},
}
ctx := pimo.NewContext(definition)
cfg := pimo.Config{
Iteration: 1,
XMLCallback: true,
}

err := ctx.Configure(cfg)
assert.Nil(t, err)

data := map[string]string{"name": "John", "name@age": "25"}

newData, err := ctx.ExecuteMap(data)

assert.Nil(t, err)
assert.NotEqual(t, "John", newData["name"])
assert.NotEqual(t, "25", newData["name@age"])
}
28 changes: 28 additions & 0 deletions internal/app/pimo/xixo.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Copyright (C) 2022 CGI France
//
// This file is part of PIMO.
//
// PIMO is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// PIMO is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with PIMO. If not, see <http://www.gnu.org/licenses/>.

package pimo

import (
"io"

"github.com/CGI-FR/xixo/pkg/xixo"
)

func ParseXML(input io.Reader, output io.Writer) *xixo.XMLParser {
return xixo.NewXMLParser(input, output).EnableXpath()
}
Loading
Loading