Commit Graph

43 Commits

Author SHA1 Message Date
Jan Engelhardt
d296c2d1d5
vmime: prevent loss of a space during text::createFromString (#306)
```
mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate();
```

produces

```
=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com>
```

The first space between ``Test`` and ``München`` is encoded as an
underscore along with the first word: ``Test_``. The second space
between ``München`` and ``West`` is encoded with neither of the two
words and thus lost. Decoding the text results in ``Test
MünchenWest`` instead of ``Test München West``.

This is caused by how ``vmime::text::createFromString()`` handles
transitions between 7-bit and 8-bit words: If an 8-bit word follows a
7-bit word, a space is appended to the previous word. The opposite
case of a 7-bit word following an 8-bit word *misses* this behaviour.

When one fixes this problem, a follow-up issue appears:

``text::createFromString("a b\xFFc d")`` tokenizes the input into
``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This
"right-side alignment" nature of the whitespace is a problem for
word::generate():

As per RFC 2047, spaces between adjacent encoded words are just
separators but not meant to be displayed. A space between an encoded
word and a regular ASCII text is not just a separator but also meant
to be displayed.

When word::generate() outputs the b-word, it would have to strip one
space, but only when there is a transition from encoded-word to
unencoded word. word::generate() does not know whether d will be
encoded or unencoded.

The idea now is that we could change the tokenization of
``text::createFromString`` such that whitespace is at the *start* of
words rather than at the end. With that, word::generate() need not
know anything about the next word, but rather only the *previous*
one.

Thus, in this patch,

1. The tokenization of ``text::createFromString`` is changed to
   left-align spaces and the function is fixed to account for
   the missing space on transition.
2. ``word::generate`` learns how to steal a space character.
3. Testcases are adjusted to account for the shifted
   position of the space.

Fixes: #283, #284

Co-authored-by: Vincent Richard <vincent@vincent-richard.net>
2024-05-21 15:55:06 +02:00
Jan Engelhardt
c105165c6e
tests: switch a byte sequence in textTest (#305)
Switch out the byte sequence by one that is simiarly random, but one
which happens to decode as valid UTF-8, such that the expected and
actual strings are shown with reasonable characters on a terminal.
2024-05-21 15:48:26 +02:00
Jan Engelhardt
b447adbe37
Fixes/comments for guessBestEncoding (#304)
* tests: add case for getRecommendedEncoding

* vmime: avoid integer multiply wraparound in wordEncoder::guessBestEncoding

If the input string is 42949673 characters long or larger, there will
be integer overflow on 32-bit platforms when multiplying by 100.
Switch that one computation to floating point.

* vmime: update comment in wordEncoder::guessBestEncoding
2024-05-21 15:47:05 +02:00
vincent-richard
5c00f7867a #238 Fixed whitespace between encoded words 2020-06-16 19:47:33 +02:00
Vincent Richard
b55bdc9c0b Code style and clarity. 2018-09-05 23:54:48 +02:00
Vincent Richard
c53e914ea5 Always ignore newlines between words. 2017-01-02 21:40:38 +01:00
Vincent Richard
5424aa2381 Fixed #149: don't loose charset when fixing invalid broken words. 2016-11-05 13:31:54 +01:00
Vincent Richard
c5c66f9fdc Issue #103: fix badly encoded words. 2015-02-16 18:43:03 +01:00
Vincent Richard
e7739c0efe Fixed issue #98: support for wrongly padded B64 words. 2015-01-14 19:35:28 +01:00
Vincent Richard
03a0e36e91 Added support for language specification in RFC-2047 encoded words and RFC-2231 parameter values. 2014-06-30 22:48:42 +02:00
Vincent Richard
ef892af655 Do not make calls to setlocale() in a library. Use default user locale in tests and examples. 2014-01-16 00:27:51 +01:00
Vincent Richard
f9913fa28a Boost/C++11 shared pointers. 2013-11-21 22:16:57 +01:00
Vincent Richard
1df8c6cd0e Refactored unit tests. 2013-03-08 08:19:55 +01:00
Vincent Richard
49f9628c0a Fixed typo in function name. 2013-02-25 13:10:15 +01:00
Vincent Richard
0c5d4a10e6 Message generation/parsing context. Charset conversion options. Preliminary implementation of RFC-6532. 2013-02-24 16:28:13 +01:00
Vincent Richard
ad9bef78c4 Updated copyright year and maintainer email address. 2013-01-10 17:30:31 +01:00
Vincent Richard
92b4dc8648 Fixed encoding of whitespace. Fixed old test case. 2011-06-26 12:47:25 +00:00
Vincent Richard
3cec9612fa Fixed possible infinite loop (thanks to John van der Kamp, Zarafa). 2011-01-21 15:28:06 +00:00
Vincent Richard
dbcb03893c Fold non-encoded lines in the case there is no whitespace in them. 2010-10-18 14:20:34 +00:00
Vincent Richard
097bde861d Fixed missing whitespace in text parsing. 2010-10-12 20:01:34 +00:00
Vincent Richard
e8cb19f9e5 Encode quotation marks in QP/RFC-2047. 2010-10-12 09:45:16 +00:00
Vincent Richard
4ff310c7e4 Always encode special charsets. 2010-05-21 07:41:15 +00:00
Vincent Richard
a5d258dc72 Relicensed VMime under the GNU GPL version 3. Changed copyright year to 2009. 2009-09-06 12:02:10 +00:00
Vincent Richard
439b2b3e90 Fixed extra space in subject (see https://sourceforge.net/forum/message.php?msg_id=4894970). 2008-04-28 19:49:48 +00:00
Vincent Richard
0c30c298da Changed copyright year to 2008. 2008-01-04 18:07:40 +00:00
Vincent Richard
a87652e7b4 Fixed incorrect white-space between words. 2007-11-20 21:45:54 +00:00
Vincent Richard
d284cfa729 Changed copyright year to 2007. 2007-01-01 20:55:15 +00:00
Vincent Richard
b79a6ad890 Fixed bug #1096610: non-integral number of chars in RFC-2047 encoded words. 2006-10-02 13:44:00 +00:00
Vincent Richard
63d21f7a09 Changed copyright year to 2006. 2006-02-05 10:22:59 +00:00
Vincent Richard
cbd1110a4b Updated FSF address. 2005-09-17 10:10:29 +00:00
Vincent Richard
5d18fce959 Moved to CppUnit for unit tests framework. 2005-08-25 21:25:45 +00:00
Vincent Richard
8cdddcdf03 Added test case for '?' in the middle of the encoded buffer. 2005-08-22 17:28:28 +00:00
Vincent Richard
681297e10b Reference counting and smart pointers. 2005-07-12 22:28:02 +00:00
Vincent Richard
ecae17af35 Fixed a bug in word parsing. 2005-06-13 16:45:21 +00:00
Vincent Richard
b3af751a92 Updated VMime website URL. 2005-03-18 21:33:11 +00:00
Vincent Richard
e0aabf8c72 More unit tests for 'text' class. 2005-03-15 10:32:52 +00:00
Vincent Richard
4315b50297 Added test for linear-white-space between encoded words. 2005-03-14 21:36:38 +00:00
Vincent Richard
51c199723c Changed year to 2005 in copyright header. 2005-01-03 12:26:48 +00:00
Vincent Richard
da55bd2c26 Autotools and libtool support. 2004-12-30 09:32:32 +00:00
Vincent Richard
4ce991d3b1 Moved all header files to 'vmime/' directory. 2004-12-26 20:23:29 +00:00
Vincent Richard
460cae786a Default platform handlers (currently only POSIX). 2004-12-18 01:57:39 +00:00
Vincent Richard
5868c87506 Moved encodeAndFold() and decodeAndUnfold() functions from "base.cpp" to "text.cpp". 2004-11-07 10:33:01 +00:00
Vincent Richard
418c0c1456 New build system for unit tests. 2004-11-06 10:48:58 +00:00