What is markdown
Markdown is a lightweight markup language designed for creating formatted text using a plain-text editor. It is widely used for documentation, blogging, and other text-based content due to its simplicity and readability.
Markdown was created by John Gruber in collaboration with Aaron Swartz in 2004. The creators designed Markdown to provide an easy-to-read and easy-to-write format that could be converted to HTML seamlessly. The goal was to allow writers to focus on content without dealing with complex markup syntax like HTML. The original description and syntax were published on Gruber’s website [1]
Aaron Swartz was a visionary programmer and internet activist who made significant contributions to technology and digital freedom. He was the co-creator of RSS 1.0, a web feed format that allowed for the easy distribution of content. Additionally to co-creating Markdown, he was a co-founder of Reddit, contributing to its growth before the platform was acquired by Condé Nast. Swartz later co-founded Demand Progress, an organization dedicated to fighting internet censorship, most notably opposing the controversial SOPA and PIPA bills.
Swartz’s advocacy stemmed from his belief in open access to information and his commitment to making knowledge freely available to the public. However, his ideals led to legal troubles when he downloaded a large number of academic articles from JSTOR, resulting in intense scrutiny and legal pressure. Tragically, Aaron Swartz died by suicide in 2013 at the age of 26 [2].
Key Features:
1. Easy to Write: Uses simple syntax that is human-readable even without rendering.
2. Portable: Compatible across various platforms and tools.
3. Lightweight: Requires no special software, and files are often saved with the .md or .markdown extension.
Common Syntax:
• Headings: # Heading 1, ## Heading 2, …, ###### Heading 6
• Bold: **Bold Text** or __Bold Text__
• Italic: *Italic Text* or _Italic Text_
• Lists:
• Unordered: – Item or * Item
• Ordered: 1. Item, 2. Item
• Links: [Link Text](https://example.com)
• Images: ![Alt Text](image_url.jpg)
• Code:
• Inline: `code`
- Block Code Syntax
To create a code block in Markdown, use triple backticks (` ` `) or indent the lines with 4 spaces.
Example with Triple Backticks:
` ` `language
Code goes here
` ` `
• Blockquotes: > This is a quote
• Horizontal Lines: — or ***
• Blockquotes: > This is a quote
• Horizontal Lines: — or ***
Common Uses:
• Writing README files for GitHub projects.
• Documentation for software and tools.
• Blogging platforms like Medium or Ghost.
• Note-taking apps like Obsidian.
Markdown can be easily converted to HTML, making it versatile for web content.
Extended Markdown Versions
There are several flavors or extensions of Markdown that add extra features:
• CommonMark: A standardized version of Markdown for consistency.
• GitHub-Flavored Markdown (GFM): Adds support for features like tables, task lists, and strikethrough.
• Markdown Extra: Allows additional features like footnotes and definition lists.
• Pandoc Markdown: Designed for advanced publishing workflows, including citations.
Advanced Markdown Syntax
Markdown can be extended with richer formatting.
• Strikethrough:
Use ~~ for strikethrough text.
Example: ~~Strikethrough~~ → Strikethrough
• Tables:
Use pipes | and dashes – to create tables
| Name | Age | City|
|———-|—–|———–|
| Alice |25|New York |
|Bob|30|Chicago|
Output:
Name | Age | City |
---|---|---|
Alice | 25 | New York |
Bob | 30 | Chicago |
• Task Lists (GFM Only):
Use – [ ] for incomplete tasks and – [x] for completed tasks.
– [x] Task 1
– [ ] Task 2
Output:
- Task 1
- Task 2
• Footnotes:
Add references using square brackets and caret ^.
Here is a reference[^1].
[^1]: This is the footnote text.
Tools to Work with Markdown
• Editors:
• Typora, Obsidian, Mark Text: User-friendly Markdown editors.
• VS Code: Supports Markdown natively with extensions for preview.
• Converters:
• Pandoc: Converts Markdown to PDF, HTML, Word, and more.
• Markdown-it: Renders Markdown in web applications.
Markdown to HTML Conversion
Markdown can be converted into HTML for use on the web. Example:
# Heading 1
This is **bold text** and *italic text*.
Converts to :
<h1>Heading 1</h1>
<p>This is <strong>bold text</strong> and <em>italic text</em>.</p>
Useful tool – MarkItDown by Microsoft
The MarkItDown library [3] is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
- PDF (.pdf)
- PowerPoint (.pptx)
- Word (.docx)
- Excel (.xlsx)
- Images (EXIF metadata, and OCR)
- Audio (EXIF metadata, and speech transcription)
- HTML (special handling of Wikipedia, etc.)
- Various other text-based formats (csv, json, xml, etc.)
- ZIP (Iterates over contents and converts each file)
Installation
You can install
markitdown
using pip:
pip install markitdown
or from the source
pip install -e .
Usage
The API is simple:
from markitdown import
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)
To use this as a command-line utility, install it and then run it like this:
markitdown path-to-file.pdf
This will output Markdown to standard output. You can save it like this:
markitdown path-to-file.pdf > document.md
You can pipe content to standard input by omitting the argument:
cat path-to-file.pdf | markitdown
You can also configure markitdown to use Large Language Models to describe images. To do so you must provide
llm_client
and
llm_model
parameters to MarkItDown object, according to your specific client.\
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)
You can also use the project as Docker Image
docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
References
[1] https://daringfireball.net/projects/markdown/