Conversation Analysis Transcript in LaTeX

Rendering Conversation Analysis (CA) type transcript in LaTeX can be time consuming and outright frustrating. Since many if not most users of LaTeX work in technical fields this problem haven’t been given much attention. Therefore, I will in this post show a few methods of how this can be achieved.

The Problem with the Transcript

Humanities research often involves transcripts of dialogs or interviews. In my own work I often use CA to describe or showcase how people interact with robots or other technologies. One problem that keeps popping up is how CA transcriptions (using the Jefferson notation system) can be rendered in LaTeX. The problem is complicated by the fact that I do all my transcripts in CLAN as this program also enables me to do linguistic analyses in addition to the CA analysis, and well all my colleages use it. One method I’ve been using for a while is simply to put the transcript in a tabular environment like this:

\begin{tabular}{llp{6.2cm}}
 1. & Robot: & welcome (.) my name is easy. \\
 2. & Robot: & what is yours? \\
 3. &          & (0.5)\\
 4. & Human: & my name is lene.\\
 5. &          & (7.5) \\
 6. & Robot: & hi leen.\\
 7. &           & (3.0)\\
 8. & Robot: & you are here to play a game with me. \\
\end{tabular}

This will also render quite nicely:

However, in CA we often also deal with overlapping speech, mark turn final intonation contours, mark gaze patterns, etc. Rendering a transcript with this level of detail in a tabular environment can be a royal pain in the ***. Plus, sometimes you also want to add screenshots to certain lines.

Existing Solutions

Saul Albert who runs a nice blog at saulalbert.net has posted two solutions to this problem. One solution relies on a tabular environment just as in my above example. He then uses /hspace to align overlapping speech and /raisebox to indicate overlap.  In the other solution he’s proposed he uses Pandoc and Markdown to render his transcripts. It works well for sure, but the workflow is rather heavy I think. Plus i don’t like the fact that I have to rewrite all my carefully transcribed dialogs.

Another solution is to use Clemens Horch’s CoNan enviroment (it’s available on github here). From what I can tell it’s not available on CTAN so you’ll have download the package manually. It works okay, but only with XeLaTeX or LuaLaTeX, so no PDFLaTeX support. Again, it would also mean than I’d have to go over every detail of the transcript again.

This brings me to my final method which is also not without any drawbacks, but here goes.

My Solution

So the workflow is as follows: CLAN -> Editor (e.g. Adobe Illustrator) -> LaTeX

In CLAN:

Save your transcript from CLAN to PDF. This is not particular to CLAN, so this can be achieved with any transcription program (or text editor even) that can export to PDF.

In Illustrator

Then you open up the PDF in a program than can edit PDF files. I prefer Adobe Illustrator, but other software packages such as Inkscape should also do the trick. Since text elements are treated as text you can edit line numbers, speaker id’s, references to images and change any visual element of the transcript. Then you save it again as a PDF file.

In LaTeX

You add your transcript by using /includegraphics wrapping that tag in a figure environment like this:

\begin{figure}[h]
\vspace{-10pt}
\includegraphics[scale=0.8]{transcript.pdf}
\footnotesize{Excerpt 3.1}
\vspace{-30pt}
\end{figure}

Depending on how you cut the white space in Illustrator you might want to fiddle a bit around with the vertical spacing using /vspace. We also wanted to add images to the transcript. I’ve done that by adding images in a tabular environment, but in reality you could do this in Illustrator as well. However, some publishers prefer images separate from text:

\begin{tabular}{ccccc}
\#1 & \#2 & \#3 & \#4 & \#5\\
\includegraphics[scale=0.05]{1.png} & \includegraphics[scale=0.05]{2.png} & \includegraphics[scale=0.05]{3.png} & \includegraphics[scale=0.05]{4.png} & \includegraphics[scale=0.05]{5.png}\\
\end{tabular}

So, a full example looks like this:

\documentclass{article}
\usepackage{graphicx}
\begin{document}

\begin{figure}[h]
\vspace{-10pt}
\includegraphics[scale=0.8]{P30E1.pdf}
\footnotesize{Excerpt 3.1}
\vspace{-30pt}
\end{figure}

\begin{tabular}{ccccc}
\#1 & \#2 & \#3 & \#4 & \#5 \\
\includegraphics[scale=0.05]{P30-1.png} & \includegraphics[scale=0.05]{P30-2.png} & \includegraphics[scale=0.05]{P30-3.png} & \includegraphics[scale=0.05]{P30-4.png} & \includegraphics[scale=0.05]{P30-5.png}\\
\end{tabular}
\end{document}

And renders a document like this:

transcript

With this solution you make only minor changes to your existing transcript, it’s technically relatively easy to do and your text is still recognized as text. I’d be happy to hear about other ways around this problem though.

6 Replies to “Conversation Analysis Transcript in LaTeX

  1. I’ve been wrestling with this issue, but came up with a different solution. I used to include the transcripts as pdf images, where the transcripts were originally word documents. This leads to a lot of fidgeting though with the scale (been there, done that). More importantly, should you get to a point where you want to publish the paper, you’ll have to edit separate files continuously.

    The best alternative I have is still a hassle though. I use the alltt package, which renders text with a courier font. Then I copy-paste the transcripts in MikTeX and replace all the symbols with the correct notifications; e.g., \sp{\circ} for a degree sign. The advantage of this is that you can easily edit the transcripts in your document and you can use spaces to make sure the lay-out works. I also put everything in an \ex environment to make numbering and referring easier, and I use \small to make the font the right size (no more fidgeting with margins!). Another advantage is that if you decide to make changes and have to change the numbering, you can do that in the actual document.

    The major drawback is underlining; everything that’s underlined needs to be marked with \underline{}. And if you have a lot, that’s a big pain. But should you ever want to submit to a journal in LaTeX, I think this would be the journal’s ideal option, since everything is in the document itself.

  2. Thanks for the comment. I haven’t come across the alltt package in my search, but I’m intrigued. What I really was looking for was a method that doesen’t require me to basically rewrite my transcript. The pdf in the solution above is saved as text and not as an image so I haven’t experienced the fidgeting you mentioned. You could also number the transcript using \caption in the figure environment. That being said I would be very interested to see you take on it. Would you mind posting a small example?

    1. Talk about your delayed responses, sure. The example is from a ms, as the data are in Dutch there are three lines of transcript. As it was only a ms, it misses underline, but that would mean adding \underline{} for the parts that need to be underlined.

      Hope it’s readable this way.

      \ex\label{ST1}ST1 — 06:34.6-06:48.0
      \begin{alltt}\small
      01 Eli ik kom morge:n _
      or something home
      I’ll come home tomorrow _
      03 (0.6)
      04 Pad o\(\downarrow\)k\’e:.
      o\(\downarrow\)kay:.
      05 Eli ja:_
      yea:h_
      06 Pad -> >jayeah< (0.6) that’s \(\uparrow\)fine?
      12 (0.3)
      13 Eli ik had [gist-
      I had [yest-
      14 Pad [dan zie'k jou wel ver\(\uparrow\)schIJ:nen;
      then see.I you.SG ADV appear
      [then I’ll see you ap\(\uparrow\)pear;\end{alltt}\xe

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.