Joseph Herlant
version 1.0.0, 2013-10-26 : Initial version

1. What’s the issue with non-ascii characters?

In asciidoc (and a2x), the documents are transformed in an xml file and then processed using dblatex, fop or other tools.

In this xml file, a section has an id assigned that is the title with spaces replaced by underscores.

If you generate a pdf with dblatex which is the default for an a2x -f pdf command) and that your title contains some non-ascii characters, the xml will render them correctly, but not dblatex by default.

In this case, you will encounter an error as described in this debian bug report. To summarize, the non-verbose mode will render this:

$ a2x -f pdf /tmp/test.txt
a2x: ERROR: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty"   "/tmp/test.xml" returned non-zero exit status 1

And the verbose mode will render something like this:

$ a2x -v -f pdf /tmp/test.txt
a2x: args: ['-v', '-f', 'pdf', '/tmp/test.txt']
a2x: resource files: []
a2x: resource directories: ['/etc/asciidoc/images', '/etc/asciidoc/stylesheets']
a2x: executing: "/usr/bin/asciidoc" --backend docbook -a "a2x-format=pdf"  --verbose  --out-file "/tmp/test.xml" "/tmp/test.txt"

asciidoc: reading: /etc/asciidoc/asciidoc.conf
asciidoc: reading: /etc/asciidoc/asciidoc.conf
asciidoc: reading: /tmp/test.txt
asciidoc: reading: /etc/asciidoc/docbook45.conf
asciidoc: reading: /etc/asciidoc/filters/graphviz/graphviz-filter.conf
asciidoc: reading: /etc/asciidoc/filters/source/source-highlight-filter.conf
asciidoc: reading: /etc/asciidoc/filters/music/music-filter.conf
asciidoc: reading: /etc/asciidoc/filters/code/code-filter.conf
asciidoc: reading: /etc/asciidoc/filters/latex/latex-filter.conf
asciidoc: reading: /etc/asciidoc/lang-en.conf
asciidoc: writing: /tmp/test.xml

a2x: executing: "xmllint" --nonet --noout --valid "/tmp/test.xml"


a2x: executing: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty"  -V  "/tmp/test.xml"

Build the book set list...
xsltproc -o /tmp/tmpAI1LA9/doclist.txt --xinclude --xincludestyle doclist.xsl /tmp/test.xml
Build the listings...
xsltproc -o /tmp/tmpAI1LA9/listings.xml --xinclude --xincludestyle --param current.dir '/tmp' /usr/share/dblatex/xsl/common/mklistings.xsl /tmp/test.xml
xsltproc -o test.rtex --xinclude --xincludestyle --param current.dir '/tmp' --param listings.xml '/tmp/tmpAI1LA9/listings.xml' /tmp/tmpAI1LA9/custom.xsl /tmp/test.xml
XSLT stylesheets DocBook - LaTeX 2e (0.3.4-3)
===================================================
Image 'dblatex' not found
Build test.pdf
built-in module pdftex registered
no support found for ifthen
no support found for ifxetex
no support found for fontspec
no support found for xltxtra
no support found for fontenc
no support found for inputenc
no support found for fancybox
built-in module makeidx registered
no support found for asciidoc-dblatex
building additional files...
checking if compiling is necessary...
the output file doesn't exist
pdflatex -interaction=batchmode test.tex
pdflatex failed
test.aux:27: Missing \endcsname inserted.
test.aux:27: leading text: ... e.g. � break a2x headlines}{section.1}{}}
test.aux:27: Missing $ inserted.
test.aux:27: leading text: ... e.g. � break a2x headlines}{section.1}{}}
test.aux:60: Missing $ inserted.
test.aux:60: leading text: \end{document}
Unexpected error occured
Error: pdflatex compilation failed

a2x: ERROR: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty"  -V  "/tmp/test.xml" returned non-zero exit status 1

2. How to solve this issue?

There’s 2 ways to solve the issue. The one that I found much cleaner and convenient one is the second one but both have to be known.

2.1. The dblatex option

To make dblatex handle properly the non-ascii characters in ids, you have to install the cyrillic support for texlive. In debian, you can do:

sudo apt-get install texlive-lang-cyrillic

And then generate your documents modifying the latex.encoding parameter like this:

a2x --dblatex-opts="--param=latex.encoding=utf8" --no-xmllint /tmp/test.txt

Then you can modify your asciidoc dblatex options parameter file to add this.

2.2. The built-in asciidoc (much cleaner) option

But if you want, asciidoc can do it for you with the built-in tag :ascii-ids: that transforms non-ascii characters to ascii characters using the "unicodedata" python module.

Example of using this tag:

This is an example asciidoc-file
================================
Gerhard Siegesmund <jerri@jerri.de>
:ascii-ids:

Umlauts e.g. ü break a2x headlines
----------------------------------

Stable asciidoc (8.5.x) has no problems with this file.