PubCSS: Formatting Academic Publications in HTML & CSS

You have two choices when it comes to formatting academic papers for conferences and journals: Microsoft Word and LaTeX.

Word is familiar and easy for anyone to pick up. But the WYSIWYG interface that makes it so easy also makes it easy to create inconsistent, amateur-looking documents that are tough to maintain and fine-tune. On top of this, you get a file format that isn’t friendly to revision control, diminishing collaboration through a system like GitHub.

LaTeX is powerful and produces beautiful documents, but requires installing a hefty toolchain, learning a new syntax, and familiarizing yourself with its powerful abstractions. This is a significant front-end investment, one that may not pay off. In my experience, there’s usually at least one author who hasn’t made that investment, and so the team falls back to Word anyway.

PubCSS is an exploration of HTML and CSS as a third option. It’s a library of stylesheets and templates for formatting academic papers. Perhaps HTML and CSS can hit the sweet spot between the ease of Word and the efficiency LaTeX. Like LaTeX, HTML is a markup language, but one that a lot more people have experience with. CSS has proven its worth for styling fluid layouts, and with CSS3 modules around the corner, things can only improve for print.

Output

So how’s PubCSS’s output look? Pretty darn good. Here’s a sample page for the ACM SIG proceedings. Note that most of the layout, typography, and numbering is handled by a single stylesheet.

acm-sig-sample-latex

You can compare the actual PDF output for the following formats:

A bonus of using HTML is that with a few lines of CSS targeting @media screen, you’ve got yourself a web version of the paper.

Web Paper

You can also add niceties like responsive design, reference tooltips, and image lightboxes to create an interactive web paper that feels at home on the web.

Web Paper Theme

Quick Start

Using PubCSS is simple:

  1. Create an HTML file, or modify one of the templates, to add your content
  2. Link your HTML to a pub.css stylesheet
  3. Build a PDF from the HTML using Prince with the command
    prince input.html output.pdf

You can grab the templates and stylesheets from the GitHub repo. ACM and IEEE formats are available so far. The only dependency is Prince, which is free for non-commercial use.

Documentation

OK, it’s not quite that simple. You still have to mark up your content. Here’s what the markup for a typical paper looks like. The main components are outlined below.

Sections are automatically numbered by PubCSS. Just follow this pattern.

<p>
<h1>Section Header</h1>
<section>
  <h2>Subsection Header</h2>
  <p>Lorem ipsum</p>
</section>
</p>

Figures and tables are similarly numbered if you include a caption.

<table>
  <tr><td>1</td><td>2<td></tr>
  <caption>Example Table</caption>
</table>

<figure>
  <img src="graph.png">
  <figcaption>Example Figure</figcaption>
</figure>

References are also numbered. Don’t forget to assign them unique IDs.

<cite id="nicole">Nicole. 2015. Title of paper. <em>Journal</em>, 4(3), 1-10.</cite>

Citations to the references make use of these IDs.

<a href="#nicole"></a>

Sections, tables, and figures can also be referenced by adding a class.

<a href="#intro" class="section"></a>
<a href="#example-table" class="table"></a>
<a href="#example-figure" class="figure"></a>

Equations are also numbered. MathML is well-supported by Prince. For the web, you’ll need MathJax to render MathML properly in Chrome and Internet Explorer.

<div class="equation">
  <math xmlns="http://www.w3.org/1998/Math/MathML">
    <mi>E</mi>
    <mo>=</mo>
    <mi>m</mi>
    <msup>
      <mi>c</mi>
      <mn>2</mn>
    </msup>
  </math>
</div>

Footnotes are made within the body text, and are automatically moved to the bottom of the current page.

<p>This is text.<span class="footnote">And this is a footnote.</span></p>

Smart quotes can be used in lieu of straight quotes by enclosing the text like so. You can nest quotes within quotes.

<q>To be or not be.</q>

Utility classes are also available to modify layout and counter behavior.

  • col-2 divide the element into 2 columns
  • col-3 divide the element into 3 columns
  • col-4 divide the element into 4 columns
  • col-span span all of the columns of parent
  • col-break-before force column break before element
  • col-break-after force column break after element
  • page-break-before force page break before element
  • page-break-after force page break after element
  • counter-skip do not count this header
  • counter-reset reset counter to 0

For customization, one of the major advantages of PubCSS is that you can use CSS. All of the usual rules apply.

The architecture of PubCSS is similar to Bootswatch — a set of LESS files used to generate CSS for the current themes and to bootstrap new ones. If you want to make more extensive changes, or create a new theme, you’ll want to dig into the LESS source. The most common changes can be made through variables.less, such as toggling page numbers and setting counter styles.

Notes

PubCSS is meant to be proof-of-concept, a demonstration of HTML and CSS as viable alternative for formatting academic papers. Note that these templates are not officially sanctioned by ACM or IEEE. Unless a venue requires only the PDF output, they’ll also be expecting the source in the form of LaTeX files or a Word document.

Among the limitations of PubCSS are that citations in APA or MLA style (Park, 2015) are possible but clumsier than their counterpart [1]. You also lose the magic of BibTeX for reference management, though on the bright side, all of your content is localized to a single file. Finally, when debugging PDF output, you really come to miss the developer tools available in web browsers today.

For the interactive web paper, I used JavaScript and hard-coded HTML to achieve many of the interactive effects. But once browsers start supporting CSS3 properties like target-counter and target-content, as Prince already does, it’ll be possible to implement most of these effects with only CSS — that is, outputting both print and fully interactive web versions with a single pub.css file. How cool.

Finally, get in touch if you have a request for another publication format. I’d love to hear from you if you’re interested in using PubCSS.

25 comments Write a comment

  1. Very interesting article.

    Some food for though from someone who used LaTeX a lot while doing research in Computer Science (Theory), and now works in the digital publishing scene.

    I think that the most important theme remained hidden. Take the MathML example: no sane person will write MathML directly, as none will do with any XML-like language. HTML is quite light, I agree, but the same observation applies to it as well. No author will write an hyperlink tag directly.

    In fact, I think that the interesting research direction here is about “tools to help people produce clean markup, without knowing any markup syntax”. LaTeX is quite popular, beyond its excellent typesetting capabilities, because it is relatively concise, given its expressive power. (The fact that MathJax renders LaTeX-formatted formulas is not a coincidence.) Clearly LaTeX is text-based, reflecting the era when it was created. Probably today the tools I mentioned above will need GUIs. But the core of the problem is the same.

    Once you have such a system in place, choosing the output (HTML, PDF, EPUB, whatever) is just a matter of changing the “output routine” and “loading the suitable templates”, according to user’s needs. Of course, I salute with favor any effort adopting open formats, especially the Open Web Platform.

    I hope to read more stuff from you soon.

    • Thanks for your thoughtful comments. I agree there’s an opportunity for a tool like this. A GUI could require less set-up and offer much-needed structure to the process, and even a WYSIWYG approach, which has its share of limitations, might be effective for a more constrained task like this.

      What attracts me about HTML & CSS is that there seem to always be those edge cases when it comes to formatting, like positioning a large figure or custom one-off styling. Lots of people are equipped to handle these using CSS.

  2. Thank you so much.

    Finally, being able to write for the web and still comply with these *$@# formatting guidelines for publication, rather than complying first and then wasting hours copy-pasting content.

    I really wanted to do this but every time the itch came, I couldn’t scratch it… for I had a paper to write ;)

    Thanks again!

    • Unless I understood each project wrong, Asciidoc is a means of generating HTML, while PubCSS is a stylesheet for it. I do not see how either overlaps with the other (quite the opposite, they could be complementary if PubCSS targeted Asciidoc’s HTML output).

      • That’s right, PubCSS is mainly a set of canned stylesheets that make the necessary HTML semantic and concise so that it can be authored directly, but could potentially support output from Asciidoc or MultiMarkdown.

    • >Asciidoc is all this and much more.

      True. But not everyone needs the much more. For some people, a simple and direct solution like PubCSS is best.

      There is no one-size-fits-all solution. Different people have different needs and for those who already know HTML and CSS, PubCSS is a viable option.

  3. I think there is a way to have most of the powerful features of LaTeX while hiding its complexities from a user’s point of view: LyX (www.lyx.org). I’ve used extensively while finishing my degree in Physics and I can’t imagine what it would take to typeset some of the equations in MathML.

    As Alberto Pettarin has stated, LaTeX was developed in a “text-only” (or text mostly) era, but there is another reason why a text-based approach has lasted for so long: typying equations as you think about them is faster -and it feels more “natural- than selecting individual components of such equations from drop-down menus or something purely graphical.

    That’s why I chose LyX: when it comes to writing (or correcting what I’ve been writing), I don’t want to get distracted with the “extra” markup that it’s needed to make something bold, or italics. This is where graphical tools as LyX excels at, as it feels as if you were using a text processor. And when it comes to write mathematical expressions, you can do “most of it” by entering mathematical mode and just writing your equation. Most of the time you don’t have to learn the “code” for the symbols, as they are derived from their english name: int for the integration sign, phi, pi, sigma and so on for the greek letters, etc. I don’t speak english natively (I’m from Spain), but still it feels more natural than having to take care of the markup language directly (HTML or LaTeX).

  4. Thanks for this. This is really great stuff. It actually reminds me of PeerJ’s recent initiative: https://github.com/peerj/paper-now. Maybe the two could even be combined for an even better framework?
    Your approach seems like something that could be combined with Pandoc http://johnmacfarlane.net/pandoc/. This could allow authors to write their papers in for example Markdown or even LaTeX and then convert it to HTML and apply the stylesheet. The latter case would be great for me, as I am quite experienced in LaTeX but looking for a way to produce sensible web-friendly output from it, i.e. re-flowable and non-paginated.

  5. You nailed it. This work has potential to liberate researchers from struggling with latex formatting for publications. The next step would be to integrate with something like asciidoc so as to remove the need to code any markups. Is there anyone submitting the resulting pdf that passed the pdf publication check such as the IEEE pdf-express?

  6. The citation needs a unique id, and I guess it would be better to use DOI number as this unique id? So that when the HTML files are generated, via some mechanism, the cited entries can be linked or something? I am not coder, and couldn’t express that clearly.

    I guess this can be further integrated into Snowball plugin so researchers would prefer wordpress more. It is somewhat like online endnote.

    • It would definitely be best to stick with a scheme like DOI, or if you’re using reference management software like EndNote or Zotero, the generated cite keys.

  7. Hi Thomas – thanks for your great work here!

    I just wanted to give you a heads up that the ACM SIGACCESS Conference on Computers and Accessibility is running a beta test for accepting HTML as a submission format this year and we are using PubCSS for our formatting. As the ACM’s premiere conference on accessibility, we are working hard to make academic publications more accessible, both during the review process and in the final publication. Given how much more accessible HTML is over both Word and PDF documents, we see HTML+PubCSS as a great way to achieve this goal. Again, thanks for your hard work. We will keep you posted on where this ends up going.

    • Hey Alex, that’s awesome news! Can see a great opportunity to build HTML5 accessibility features in as well. Look forward to hearing how the trial goes for SIGACCESS.

  8. This is very good indeed and something that I need. I am not really familiar with CSS but I wonder how we should improve this idea for using with pandoc?

  9. The page’s filled with off-putting <s instead of <s. Please fix.
    Also why are there two "Notify me of followup comments via e-mail" options?

  10. Am I understanding correctly?

    Due to the Prince dependency, there is only a paid version available for commercial use if we want to export our html to pdf reports?

    • For commercial use that’s right. When this project was started, Prince had the best support for “new” CSS features like columns and counters. A lot’s change since then and free options like Puppeteer could be dropped in.

Leave a Reply