Class MarkdownHeaderTextSplitter

java.lang.Object
com.hw.langchain.text.splitter.MarkdownHeaderTextSplitter

public class MarkdownHeaderTextSplitter extends Object
Implementation of splitting markdown files based on specified headers.
Author:
HamaWhite
  • Constructor Details

    • MarkdownHeaderTextSplitter

      public MarkdownHeaderTextSplitter(List<org.apache.commons.lang3.tuple.Pair<String,String>> headersToSplitOn)
    • MarkdownHeaderTextSplitter

      public MarkdownHeaderTextSplitter(List<org.apache.commons.lang3.tuple.Pair<String,String>> headersToSplitOn, boolean returnEachLine)
  • Method Details

    • aggregateLinesToChunks

      public List<Document> aggregateLinesToChunks(List<LineType> lines)
      Combine lines with common metadata into chunks.
      Parameters:
      lines - Line of text / associated header metadata
      Returns:
      List of Document chunks
    • splitText

      public List<Document> splitText(String text)
      Split markdown file.
      Parameters:
      text - Markdown file
      Returns:
      List of Document chunks