Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more selectors #28

Merged
merged 3 commits into from
Mar 17, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add support for requiring text content when matching.
With this commit, we can further require the node selected with
CSS selector to match certain text regexp.

E.g.,

<h2>This is a title</h2> with "requiretext": "^This is" will
considered as a match, while with "requiretext": "^No", it is not.
  • Loading branch information
antiagainst committed Jan 14, 2017
commit cdaa4aae2ea4b047fb32dd3497762a6a626850ce
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ The format of the selector value is:

```json
"css selector": {
"requiretext": "require that the text matches a regexp. If not, this node is not considered as selected",
"type": "Dash data type",
"attr": "Use the value of the specified attribute instead of html node text as the basis for transformation",
"regexp": "PCRE regular expression (no need to enclose in //)",
Expand Down
17 changes: 15 additions & 2 deletions dashing.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ type Transform struct {
Attribute string // Use the value of this attribute as basis
Regexp *regexp.Regexp // Perform a replace operation on the text
Replacement string
RequireText *regexp.Regexp // Require text matches the given regexp
MatchPath *regexp.Regexp // Skip files that don't match this path
}

Expand Down Expand Up @@ -227,7 +228,7 @@ func decodeSelectField(d *Dashing) error {
} else if rv.Kind() == reflect.Map {
val := val.(map[string]interface{})
var ttype, trep, attr string
var creg, cmatchpath *regexp.Regexp
var creg, cmatchpath, requireText *regexp.Regexp
var err error

if r, ok := val["attr"]; ok {
Expand All @@ -246,6 +247,12 @@ func decodeSelectField(d *Dashing) error {
if r, ok := val["replacement"]; ok {
trep = r.(string)
}
if r, ok := val["requiretext"]; ok {
requireText, err = regexp.Compile(r.(string))
if err != nil {
return fmt.Errorf("failed to compile regexp '%s': %s", r.(string), err)
}
}
if r, ok := val["matchpath"]; ok {
cmatchpath, err = regexp.Compile(r.(string))
if err != nil {
Expand All @@ -257,6 +264,7 @@ func decodeSelectField(d *Dashing) error {
Attribute: attr,
Regexp: creg,
Replacement: trep,
RequireText: requireText,
MatchPath: cmatchpath,
}
} else {
Expand Down Expand Up @@ -462,11 +470,16 @@ func parseHTML(path string, source_depth int, dest string, dashing Dashing) ([]*
m := css.MustCompile(pattern)
found := m.MatchAll(top)
for _, n := range found {
textString := text(n)
if sel.RequireText != nil && !sel.RequireText.MatchString(textString) {
fmt.Printf("Skipping entry for '%s' (Text not matching given regexp '%v')\n", textString, sel.RequireText)
continue
}
var name string
if len(sel.Attribute) != 0 {
name = attr(n, sel.Attribute)
} else {
name = text(n)
name = textString
}

// Skip things explicitly ignored.
Expand Down