Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract and import text content of documents and blocks #1040

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
19ab326
Start with adding content extraction method to blocks and documents
magdalenaxm Mar 30, 2023
6f2e1a6
Add inlineStyle and entityRange style tags in extracted text
magdalenaxm Apr 3, 2023
b901765
Start with method for content import
magdalenaxm Apr 4, 2023
6ca9327
Add import content method to all basic blocks
magdalenaxm Apr 5, 2023
afdf735
Update test file
magdalenaxm Apr 6, 2023
732f7d1
Resolve merge conflicts
magdalenaxm Apr 6, 2023
cff1d04
Remove extractTextContents from predefined page
magdalenaxm Apr 6, 2023
93c2bf3
Update description in useExtractPages
magdalenaxm Apr 6, 2023
5246db1
Rework replaceTextContents method to return block state and fix wrong…
magdalenaxm Apr 12, 2023
236cf84
Update packages/admin/cms-admin/src/blocks/createTextLinkBlock.tsx
magdalenaxm Apr 17, 2023
c1c37ea
Rename extraction method and change return type
magdalenaxm Apr 17, 2023
47e16e1
Add more tests and use inputToOutput in update mutation
magdalenaxm Apr 17, 2023
4b37025
Add indices in pseudo tags and update tests
magdalenaxm Apr 17, 2023
0e60b68
Update input variable to leave transformation defined by application
magdalenaxm Apr 19, 2023
9a11be2
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Apr 21, 2023
f2ee8df
Adapt export to import format and add extractContent prop to composit…
magdalenaxm Apr 21, 2023
037b811
Start with xml export format
magdalenaxm Apr 26, 2023
ec998a1
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Apr 28, 2023
6c23597
Add identifier to pseudo tags
magdalenaxm Apr 28, 2023
1d61c29
remove first implementation of richtext content extraction
magdalenaxm Apr 28, 2023
6dcbf26
Handle xml import, add scripts to export and import po files
magdalenaxm May 2, 2023
641219b
Handle missing tabs in import
magdalenaxm May 10, 2023
6808186
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm May 10, 2023
ec49508
Handle entity ranges in export and import, update tests
magdalenaxm May 12, 2023
d61a2a2
Rename extraction method export
magdalenaxm May 12, 2023
4ef7161
Add basic csv extraction
magdalenaxm May 12, 2023
fd7a7e1
Rename tags and state export method
magdalenaxm May 24, 2023
aabed99
Add csv import
magdalenaxm May 24, 2023
377352c
Use document output as result of extractTextContents
johnnyomair May 25, 2023
e516b5d
Draft: Replace contents in rich text block
johnnyomair May 25, 2023
b63fbf8
Remove unused links from entity map, stay in comet-block state when r…
magdalenaxm Jun 7, 2023
491dc2c
Remove extractTextContents and replaceContents from FullWidthImage
magdalenaxm Jun 7, 2023
6373597
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Jun 14, 2023
5304273
Update path to generated gql variables
magdalenaxm Jun 14, 2023
4e871e0
Allow name and slug to be updated even when export and import methods…
magdalenaxm Jun 14, 2023
bb7dc16
Rename extractContent and remove scripts from package.json
magdalenaxm Jun 19, 2023
44cf026
Add extractTextContents to link in textLinkBlock
magdalenaxm Jun 19, 2023
e4a6e33
Fix file typo
magdalenaxm Jun 19, 2023
c83322b
Move methods for csv export and import to new files
magdalenaxm Jun 19, 2023
65a9b92
Update XmlToState to use complete state as parameter, renaming of xml…
magdalenaxm Jun 19, 2023
3db702c
Change filename
magdalenaxm Jun 20, 2023
59eb878
Change filename
magdalenaxm Jun 20, 2023
59a5ec2
Move draft-js to xml conversions to comet/admin
magdalenaxm Jun 20, 2023
dfd10b7
Rework content import in order to hande duplicate text with different…
magdalenaxm Jun 20, 2023
f14e518
Add more comments
magdalenaxm Jun 20, 2023
6567e08
Fix extract and import contents for settingsBlock
magdalenaxm Jun 20, 2023
5bed41f
Fix csv parsing and special characters parsing
magdalenaxm Jun 21, 2023
d58a302
Add back escaping
magdalenaxm Jun 21, 2023
af5b44f
Add changeset
magdalenaxm Jun 21, 2023
77c08d0
Move to rte package
magdalenaxm Jun 30, 2023
b92850f
Move to rte package
magdalenaxm Jun 30, 2023
f95ab38
Add papaparse for csv parsing
magdalenaxm Jun 30, 2023
eccdb6e
Add encoding for xml when importing csv
magdalenaxm Jun 30, 2023
46a633a
Update changelog
magdalenaxm Jul 11, 2023
23f5ae2
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Jul 24, 2023
21fbab4
Add error handling for queries
magdalenaxm Jul 24, 2023
a9600bd
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Jul 28, 2023
6bec8e1
Remove unneeded dependencies
magdalenaxm Jul 28, 2023
bc8f042
Move extract and import related files to subfolder
magdalenaxm Jul 28, 2023
865077a
Change code source
magdalenaxm Jul 28, 2023
92686cc
Fix bug when style ends inside entity range
magdalenaxm Jul 28, 2023
26dd3c0
Merge remote-tracking branch 'origin/next' into add-block-content-ext…
magdalenaxm Sep 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/olive-berries-work.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@comet/blocks-admin": minor
"@comet/cms-admin": minor
"@comet/admin": minor
---

Add the method `extractTextContents` to extract text contents of blocks and `replaceTextContents` to import and replace text contents of blocks. These two methods are accessible through two new page actions `Extract Content` and `Import Content` from the page menu. `Extract content` inserts the text content of the blocks as csv into the clipboard and `Import content` takes csv data from the clipboard and replaces the respective texts in blocks. To ensure that inline styles and entity ranges are preserved during import, they are marked with corresponding tags during export.
1 change: 1 addition & 0 deletions demo/admin/src/common/blocks/HeadlineBlock.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ export const HeadlineBlock = createCompositeBlock(
<Field name="eyebrow" label="Eyebrow" component={FinalFormInput} fullWidth />
</BlocksFinalForm>
),
extractTextContent: true,
}),
},
headline: {
Expand Down
22 changes: 22 additions & 0 deletions demo/admin/src/pages/Page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,26 @@ export const Page: DocumentInterface<Pick<GQLPage, "content" | "seo">, GQLPageIn
menuIcon: File,
hideInMenuIcon: FileNotMenu,
anchors: (input) => PageContentBlock.anchors?.(PageContentBlock.input2State(input.content)) ?? [],
extractTextContents: (input) => [
...(PageContentBlock.extractTextContents?.(PageContentBlock.input2State(input.content)) ?? []),
...(SeoBlock.extractTextContents?.(SeoBlock.input2State(input.seo)) ?? []),
],
replaceTextContents: (input, contents) => {
let contentState = PageContentBlock.input2State(input.content);

if (PageContentBlock.replaceTextContents) {
contentState = PageContentBlock.replaceTextContents(contentState, contents);
}

let seoState = SeoBlock.input2State(input.seo);

if (SeoBlock.replaceTextContents) {
seoState = SeoBlock.replaceTextContents(seoState, contents);
}

return {
content: PageContentBlock.state2Output(contentState),
seo: SeoBlock.state2Output(seoState),
};
},
};
10 changes: 9 additions & 1 deletion packages/admin/admin-rte/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,21 +37,29 @@
"@mui/icons-material": "^5.0.0",
"@mui/material": "^5.0.0",
"@mui/styles": "^5.0.0",
"@testing-library/jest-dom": "^5.16.5",
"@testing-library/react": "^12.0.0",
"@types/draft-js": "^0.11.10",
"@types/immutable": "^3.8.7",
"@types/jest": "^29.5.0",
"@types/react": "^17.0.0",
"@types/react-dom": "^17.0.0",
"@types/uuid": "^9.0.2",
"draft-js": "^0.11.4",
"eslint": "^8.0.0",
"final-form": "^4.16.1",
"jest": "^29.5.0",
"jest-environment-jsdom": "^29.5.0",
"jest-junit": "^15.0.0",
"npm-run-all": "^4.1.5",
"prettier": "^2.0.0",
"react": "^17.0",
"react-dom": "^17.0",
"react-final-form": "^6.3.1",
"react-intl": "^5.10.0",
"rimraf": "^3.0.2",
"typescript": "^4.0.0"
"typescript": "^4.0.0",
"uuid": "^9.0.0"
},
"peerDependencies": {
"@mui/icons-material": "^5.0.0",
Expand Down
100 changes: 100 additions & 0 deletions packages/admin/admin-rte/src/core/xml/getEntityRanges.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import type { CharacterMetadata } from "draft-js";
import type { List } from "immutable";
import { is, OrderedSet } from "immutable";

type EntityKey = string | undefined | null;
type Style = OrderedSet<string>;
type StyleRangeWithId = [string, { style: string; id: number }[]];
type StyleRange = [string, Style];
type EntityRange = [EntityKey, Array<StyleRangeWithId>];
export type CharacterMetaList = List<CharacterMetadata>;

export const EMPTY_SET: Style = OrderedSet();

/*
This implementation is inspired by https://github.com/jpuri/draftjs-to-html.
*/
export default function getEntityRanges(text: string, charMetaList: CharacterMetaList): EntityRange[] {
let charEntity: EntityKey = null;
let prevCharEntity: EntityKey = null;
const ranges: Array<EntityRange> = [];
let rangeStart = 0;
let lastStyle = null;
// the id is used for the pseudotags
let styleId = 0;

for (let i = 0, len = text.length; i < len; i++) {
prevCharEntity = charEntity;
const meta: CharacterMetadata = charMetaList.get(i);
charEntity = meta ? meta.getEntity() : null;

if (i > 0 && charEntity !== prevCharEntity) {
/* Styles are always within entities */
const styleRanges = getStyleRanges(text.slice(rangeStart, i), charMetaList.slice(rangeStart, i), lastStyle, styleId);
styleId = styleRanges.styleId;
ranges.push([prevCharEntity, styleRanges.styleRanges]);
rangeStart = i;
lastStyle = ranges[ranges.length - 1];
}
}

ranges.push([charEntity, getStyleRanges(text.slice(rangeStart), charMetaList.slice(rangeStart), lastStyle, styleId).styleRanges]);

return ranges;
}

function getStyleRanges(
text: string,
charMetaList: Immutable.Iterable<number, CharacterMetadata>,
lastStyle: EntityRange | null,
styleId: number,
): { styleRanges: StyleRangeWithId[]; styleId: number } {
let charStyle = EMPTY_SET;
let prevCharStyle = charStyle;
const ranges: StyleRange[] = [];
let rangeStart = 0;

/* The start and end of an entity always mark a single range.
If a style range starts before an entity range and extends into it, the last style must be used here, otherwise it will be interpreted as a new style range. */
const lastPreviousStyleRange = lastStyle ? lastStyle[1][lastStyle[1].length - 1][1] : [];

for (let i = 0, len = text.length; i < len; i++) {
prevCharStyle = charStyle;
const meta = charMetaList.get(i);
charStyle = meta ? meta.getStyle() : EMPTY_SET;

if (i > 0 && !is(charStyle, prevCharStyle)) {
ranges.push([text.slice(rangeStart, i), prevCharStyle]);
rangeStart = i;
}
}
ranges.push([text.slice(rangeStart), charStyle]);

const styleRangesWithIds: [string, { style: string; id: number }[]][] = [];

// This adds ids to the styles to identify related styling tags in export
for (let i = 0; i < ranges.length; i++) {
const stylesArray = ranges[i][1].toArray();

const styles = stylesArray.map((style) => {
// when entity ranges are in the text, the text is split up at their positions, therefore it's needed to look at the previous style
const enduringStyle = lastPreviousStyleRange.find((item) => item.style === style);

if (enduringStyle && ranges[i - 1]?.[1].toArray().length !== 0) {
return { style, id: enduringStyle.id };
} else if (i > 0 && ranges[i - 1][1].toArray().includes(style)) {
const previousStyle = styleRangesWithIds[i - 1][1].find((previousStyle) => previousStyle.style === style);
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
return { style, id: previousStyle!.id };
}

styleId += 1;

return { style, id: styleId };
});

styleRangesWithIds.push([ranges[i][0], styles]);
}

return { styleRanges: styleRangesWithIds, styleId };
}
Loading