-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use tinkr to retrieve markdown images insertions in order to convert them into .Rmd code chunks containing a call to knitr::include_graphics() ? #37
Comments
What I have for now.
library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
c("![logoR](logoR.jpg)", "![logoTidyverse](logoTidyverse.png)"),
path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
x = ex1$body,
xpath = ".//md:image",
ns = ex1$ns
)
handle_image <- function(image, ex1) {
destination <- xml2::xml_attr(image, "destination")
text <- xml2::xml_text(image)
new_text <- glue::glue(
'```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
knitr::include_graphics("@destination$")
```\n',
.open = "@",
.close = "$"
)
# Add new Markdown text
ex1$add_md(new_text, where = 1L)
# Remove original image node
xml2::xml_remove(image)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#> [1] "![logoR](logoR.jpg)"
#> [2] "![logoTidyverse](logoTidyverse.png)"
#> [3] ""
#> [4] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#> [5] " knitr::include_graphics(\"logoTidyverse.png\")"
#> [6] "```"
#> [7] ""
#> [8] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"
#> [9] " knitr::include_graphics(\"logoR.jpg\")"
#> [10] "```"
#> [11] ""
#> [12] "" Created on 2021-04-22 by the reprex package (v1.0.0.9001) |
Ah now the image suppression works library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
c("![logoR](logoR.jpg)", "![logoTidyverse](logoTidyverse.png)"),
path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
x = ex1$body,
xpath = ".//md:image",
ns = ex1$ns
)
images_copy <- images
xml2::xml_remove(images)
handle_image <- function(image, ex1) {
destination <- xml2::xml_attr(image, "destination")
text <- xml2::xml_text(image)
new_text <- glue::glue(
'```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
knitr::include_graphics("@destination$")
```\n',
.open = "@",
.close = "$"
)
# Add new Markdown text
ex1$add_md(new_text, where = 1L)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#> [1] ""
#> [2] ""
#> [3] ""
#> [4] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#> [5] " knitr::include_graphics(\"logoTidyverse.png\")"
#> [6] "```"
#> [7] ""
#> [8] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"
#> [9] " knitr::include_graphics(\"logoR.jpg\")"
#> [10] "```"
#> [11] ""
#> [12] "" Created on 2021-04-22 by the reprex package (v1.0.0.9001) |
Now for adding the chunks in the right position I need to find the original position of images. 🤔 |
I wonder whether tinkr is lacking a feature @zkamvar: maybe something like |
No way to retrieve the position via the "character number" corresponding to the retrived xpath node ? |
what do you mean? |
This below seems to work but it uses the unexported library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
c("![logoR](logoR.jpg)", "something else", "![logoTidyverse](logoTidyverse.png)"),
path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
x = ex1$body,
xpath = ".//md:image",
ns = ex1$ns
)
handle_image <- function(image, ex1) {
destination <- xml2::xml_attr(image, "destination")
text <- xml2::xml_text(image)
new_text <- glue::glue(
'\n```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
knitr::include_graphics("@destination$")
```\n',
.open = "@",
.close = "$"
)
new <- tinkr:::clean_content(new_text)
new <- commonmark::markdown_xml(new, extensions = TRUE)
new <- xml2::xml_ns_strip(xml2::read_xml(new))
xml2::xml_replace(image, new)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#> [1] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"
#> [2] " knitr::include_graphics(\"logoR.jpg\")"
#> [3] "```"
#> [4] ""
#> [5] "something else"
#> [6] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#> [7] " knitr::include_graphics(\"logoTidyverse.png\")"
#> [8] "```"
#> [9] ""
#> [10] ""
#> [11] "" Created on 2021-04-22 by the reprex package (v1.0.0.9001) |
By the poition I mean this :
|
ah no the position for xml2 is something different I think :-) |
This is what the I use this to get the line positions of elements in {pegboard}: # Get the position of an element
get_pos <- function(x, e = 1) {
as.integer(
gsub(
"^(\\d+?):(\\d+?)[-](\\d+?):(\\d)+?$",
glue::glue("\\{e}"),
xml2::xml_attr(x, "sourcepos")
)
)
}
# helpers for get_pos
get_linestart <- function(x) get_pos(x, e = 1)
get_colstart <- function(x) get_pos(x, e = 2)
get_lineend <- function(x) get_pos(x, e = 3)
get_colend <- function(x) get_pos(x, e = 4) |
ooooh that's how it works, thank you @zkamvar! Now I wonder whether we still need to use replacement instead of adding. 🤔 |
I think replacement might still be a worthwhile option. |
@pokyah do you have remaining problems? |
👋 @pokyah, could you please comment in that issue as it's still open, with some more context on what you're trying to achieve? Thank you! |
Hello,
My goal is to convert a markdown document to a Rmarkdown (.Rmd) document where the pictures are inserted using
knitr::include_graphics()
into a code chunk so that I can knit it back to markdown format where the figures will be numerated. I have easily converted the extension to.Rmd
and added the properyaml
header containing the rendering options. Now I'm stuck with the images numeration.I was thinking of using regex to find and replace all the images insertions but someone suggested me to use tinkr package.
I guess that by parsing the document, I can retrieve all the expressions corresponding to an image insertion and then replace these by a character containing the code chunk.
Here is a reprex :
=====
dummy.md
markdown document content :====
How can I parse this document in order to find all the images insertions :
and replace these by .Rmd code chunks as follow :
?
Maybe that tinkr is not the best solution to achieve this. Sorry if this is the case.
Thanks for your support
The text was updated successfully, but these errors were encountered: