Markdownファイルを Python で PDF にする - Oboe吹きプログラマの黙示録

Python スクリプト実行するディレクトリ配下の md ファイルをPDFにします。
途中 HTML にしてHTMLからPDFにします。
（条件）
ＨＴＭＬ化した時のスタイルを任意のスタイルを
１つのスタイルシートファイルで指定するものとします。
以下、スタイルシートファイルは、実行するディレクトリ配下に、
　wkstyle.less
というファイル名で存在するものとします。
（準備）

pip install Markdown
pip install pdfkit

Python ソース

# -*- coding: UTF-8 -*-
import markdown
import pdfkit
import base64
import re
import os
import glob

options = {
    'page-size': 'A4',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8"
}

# Markdown 拡張、表と目次
mdextensions = [ "tables", "toc" ]

# ディレクトリ全探索
def find_all(directory):
    for root, dirs, files in os.walk(directory):
        yield root
        for file in files:
            yield os.path.join(root, file)

# カレントディレクトリから１番最初に見つかるファイル相対PATHを取得
def onefind_path(filename):
    for path in find_all('.'):
        if re.search(filename + '$', path):
            return path
    exit(1)

# 画像ファイル→ Base64 エンコード
def imageToB64encode(path):
    with open(path, 'rb') as f:
        return base64.b64encode(f.read()).decode()

# md ファイル → PDF
def convertPDF(mdpath, pdfpath, style):
    with open(mdpath, 'rt', encoding="utf-8") as f:
        text = f.read()
        # Markdown の import 文を除去
        text = re.sub('@import ".+"\n', '', text)
        # HTMLに変換
        body = markdown.Markdown(extensions=mdextensions).convert(text)
        # 画像は、base64 エンコードして <img src=data:image/png;base64,base64エンコード文字列"/> にする。
        for imgtag in re.findall('<img .* src=".+"', body):
            s = re.search('src=".+"', imgtag).group(0).replace('src="', '').replace('"', '')
            imgval = re.search('<img .* ', imgtag).group(0)
            imgval += 'src="data:image/' + s[-3:] + ';base64,' + imageToB64encode(s) + '"'
            body = body.replace(imgtag, imgval)
        html = '<html lang="ja"><meta charset="utf-8">'
        html += '<style>' + style + '</style>'
        html += '<body>' + body + '</body></html>'
        # PDFで出力
        pdfkit.from_string(html, pdfpath, options=options)

#########################
if __name__ == '__main__':
    # スタイルは、固定 wkstyle.less で 読込
    with open(onefind_path('wkstyle.less'), 'rt', encoding="utf-8") as f:
        wkstyle = f.read()
        for mdfile in glob.glob("*.md"):
            pdfname = re.sub('\.md$', '.pdf', mdfile)
            convertPDF(mdfile, pdfname, wkstyle)
            print("-----------------------------------")
            print("%s → %s" % (mdfile, pdfname))

Markdown で記述する表を HTMLでは、table にすることにより、
ＰＤＦに変換した時に表になるように、拡張オプションtable を指定します。
さらに、目次を付ける場合は、toc を付与します。

mdextensions = [ "tables", "toc" ]

markdown.Markdown(extensions=mdextensions)

md ファイルの方で目次が最終的に作られるように、

[TOC]

[TOC] の前には、必ず１行空けて目次を入れたい箇所に記述します。

wkstyle.less はサンプルとして以下のとおりです。
H2, H3, H4, H5 タグに見出し番号がつくようにしています。

body{
    font-size: 1rem;
    counter-reset: chapter1;
}
h2 {
  counter-reset: chapter2;
}
h3 {
  counter-reset: chapter3;
}
h4 {
  counter-reset: chapter4;
}
h5 {
}
h2::before {
    counter-increment: chapter1;
    content: counter(chapter1) ". ";
}
h3::before {
    counter-increment: chapter2;
    content: counter(chapter1) "." counter(chapter2) ". ";
}
h4::before {
    counter-increment: chapter3;
content: counter(chapter1) "." counter(chapter2) "." counter(chapter3) ". ";
}
h5::before {
    counter-increment: chapter4;
    content: counter(chapter1) "." counter(chapter2) "." counter(chapter3) "." counter(chapter4) ". ";
}
table{
   border-spacing: 0; border-collapse: collapse;
}
th, td{
   border: 1px solid #000000;
}