Markdown

You are here:

Table of Contents

What is markdown

Markdown is a lightweight markup language designed for creating formatted text using a plain-text editor. It is widely used for documentation, blogging, and other text-based content due to its simplicity and readability.

Markdown was created by John Gruber in collaboration with Aaron Swartz in 2004. The creators designed Markdown to provide an easy-to-read and easy-to-write format that could be converted to HTML seamlessly. The goal was to allow writers to focus on content without dealing with complex markup syntax like HTML. The original description and syntax were published on Gruber’s website [1]

Aaron Swartz was a visionary programmer and internet activist who made significant contributions to technology and digital freedom. He was the co-creator of RSS 1.0, a web feed format that allowed for the easy distribution of content. Additionally to co-creating Markdown, he was a co-founder of Reddit, contributing to its growth before the platform was acquired by Condé Nast. Swartz later co-founded Demand Progress, an organization dedicated to fighting internet censorship, most notably opposing the controversial SOPA and PIPA bills.

Swartz’s advocacy stemmed from his belief in open access to information and his commitment to making knowledge freely available to the public. However, his ideals led to legal troubles when he downloaded a large number of academic articles from JSTOR, resulting in intense scrutiny and legal pressure. Tragically, Aaron Swartz died by suicide in 2013 at the age of 26 [2].

Key Features:

1. Easy to Write: Uses simple syntax that is human-readable even without rendering.

2. Portable: Compatible across various platforms and tools.

3. Lightweight: Requires no special software, and files are often saved with the .md or .markdown extension.

Common Syntax:

Headings: # Heading 1, ## Heading 2, …, ###### Heading 6

Bold: **Bold Text** or __Bold Text__

Italic: *Italic Text* or _Italic Text_

Lists:

• Unordered: – Item or * Item

• Ordered: 1. Item, 2. Item

Links: [Link Text](https://example.com)

Images: ![Alt Text](image_url.jpg)

Code:

• Inline: `code`

  • Block Code Syntax

To create a code block in Markdown, use triple backticks (` ` `) or indent the lines with 4 spaces.

Example with Triple Backticks:

` ` `language

Code goes here

` ` `

Blockquotes: > This is a quote

Horizontal Lines: — or ***

Blockquotes: > This is a quote

Horizontal Lines: — or ***

Common Uses:

• Writing README files for GitHub projects.

• Documentation for software and tools.

• Blogging platforms like Medium or Ghost.

• Note-taking apps like Obsidian.

Markdown can be easily converted to HTML, making it versatile for web content.

Extended Markdown Versions

There are several flavors or extensions of Markdown that add extra features:

CommonMark: A standardized version of Markdown for consistency.

GitHub-Flavored Markdown (GFM): Adds support for features like tables, task lists, and strikethrough.

Markdown Extra: Allows additional features like footnotes and definition lists.

Pandoc Markdown: Designed for advanced publishing workflows, including citations.

Advanced Markdown Syntax

Markdown can be extended with richer formatting.

Strikethrough:

Use ~~ for strikethrough text.

Example: ~~Strikethrough~~ → Strikethrough

Tables:

Use pipes | and dashes – to create tables

| Name | Age | City|

|———-|—–|———–|

| Alice |25|New York |

|Bob|30|Chicago|

Output:

NameAgeCity
Alice25New York
Bob30Chicago
Example output of pipes and dashes in markdown

Task Lists (GFM Only):

Use – [ ] for incomplete tasks and – [x] for completed tasks.

– [x] Task 1  

– [ ] Task 2  

Output:

  • Task 1
  • Task 2

Footnotes:

Add references using square brackets and caret ^.

Here is a reference[^1].  

[^1]: This is the footnote text.

Tools to Work with Markdown

Editors:

TyporaObsidianMark Text: User-friendly Markdown editors.

VS Code: Supports Markdown natively with extensions for preview.

Converters:

Pandoc: Converts Markdown to PDF, HTML, Word, and more.

Markdown-it: Renders Markdown in web applications.

Markdown to HTML Conversion

Markdown can be converted into HTML for use on the web. Example:

# Heading 1

This is **bold text** and *italic text*.

Converts to :

<h1>Heading 1</h1>

<p>This is <strong>bold text</strong> and <em>italic text</em>.</p>

Useful tool – MarkItDown by Microsoft

The MarkItDown library [3] is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

  • PDF (.pdf)
  • PowerPoint (.pptx)
  • Word (.docx)
  • Excel (.xlsx)
  • Images (EXIF metadata, and OCR)
  • Audio (EXIF metadata, and speech transcription)
  • HTML (special handling of Wikipedia, etc.)
  • Various other text-based formats (csv, json, xml, etc.)
  • ZIP (Iterates over contents and converts each file)

Installation

You can install  markitdown  using pip:

pip install markitdown

or from the source

pip install -e .

Usage

The API is simple:

from markitdown import 
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)

To use this as a command-line utility, install it and then run it like this:

markitdown path-to-file.pdf

This will output Markdown to standard output. You can save it like this:

markitdown path-to-file.pdf > document.md

You can pipe content to standard input by omitting the argument:

cat path-to-file.pdf | markitdown

You can also configure markitdown to use Large Language Models to describe images. To do so you must provide  llm_client  and  llm_model  parameters to MarkItDown object, according to your specific client.\

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)

You can also use the project as Docker Image

docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

References

[1] https://daringfireball.net/projects/markdown/

[2] https://en.wikipedia.org/wiki/Aaron_Swartz

[3] https://github.com/microsoft/markitdown

Related content

Leave a Reply