aboutsummaryrefslogtreecommitdiffstats
path: root/tests/parser (unfollow)
Commit message (Collapse)AuthorFilesLines
2024-05-21vmime: prevent loss of a space during text::createFromString (#306)Jan Engelhardt2-5/+15
``` mailbox(text("Test München West", charsets::UTF_8), "[email protected]").generate(); ``` produces ``` =?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <[email protected]> ``` The first space between ``Test`` and ``München`` is encoded as an underscore along with the first word: ``Test_``. The second space between ``München`` and ``West`` is encoded with neither of the two words and thus lost. Decoding the text results in ``Test MünchenWest`` instead of ``Test München West``. This is caused by how ``vmime::text::createFromString()`` handles transitions between 7-bit and 8-bit words: If an 8-bit word follows a 7-bit word, a space is appended to the previous word. The opposite case of a 7-bit word following an 8-bit word *misses* this behaviour. When one fixes this problem, a follow-up issue appears: ``text::createFromString("a b\xFFc d")`` tokenizes the input into ``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This "right-side alignment" nature of the whitespace is a problem for word::generate(): As per RFC 2047, spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed. When word::generate() outputs the b-word, it would have to strip one space, but only when there is a transition from encoded-word to unencoded word. word::generate() does not know whether d will be encoded or unencoded. The idea now is that we could change the tokenization of ``text::createFromString`` such that whitespace is at the *start* of words rather than at the end. With that, word::generate() need not know anything about the next word, but rather only the *previous* one. Thus, in this patch, 1. The tokenization of ``text::createFromString`` is changed to left-align spaces and the function is fixed to account for the missing space on transition. 2. ``word::generate`` learns how to steal a space character. 3. Testcases are adjusted to account for the shifted position of the space. Fixes: #283, #284 Co-authored-by: Vincent Richard <[email protected]>
2024-05-21tests: switch a byte sequence in textTest (#305)Jan Engelhardt1-2/+2
Switch out the byte sequence by one that is simiarly random, but one which happens to decode as valid UTF-8, such that the expected and actual strings are shown with reasonable characters on a terminal.
2024-05-21Fixes/comments for guessBestEncoding (#304)Jan Engelhardt2-0/+21
* tests: add case for getRecommendedEncoding * vmime: avoid integer multiply wraparound in wordEncoder::guessBestEncoding If the input string is 42949673 characters long or larger, there will be integer overflow on 32-bit platforms when multiplying by 100. Switch that one computation to floating point. * vmime: update comment in wordEncoder::guessBestEncoding
2024-05-21vmime: avoid changing SEVEN_BIT when encoding::decideImpl sees U+007F (#303)Jan Engelhardt1-0/+11
* vmime: avoid changing SEVEN_BIT when encoding::decideImpl sees U+007F Do not switch to QP/B64 when encountering U+007F. U+007F is part of ASCII just as much as U+0001 is. --------- Co-authored-by: Vincent Richard <[email protected]>
2022-01-25Fixed possible recursion crash when parsing mailbox groups.vincent-richard2-0/+43
2021-05-15Prevent accessing empty bufferibanic1-0/+27
2020-12-11Avoid force-encoding display names that fit within qcontentJan Engelhardt2-1/+14
When the display name contains an At sign, or anything of the sort, libvmime would forcibly encode this to =?...?=, even if the line is fine ASCII which only needs quoting. rspamd takes excessive quoting as a sign of spam and penalizes such mails by raising the score (rule/match: TO_EXCESS_QP et al.)
2020-06-16#238 Fixed whitespace between encoded wordsvincent-richard1-0/+85
2020-06-02Added test.vincent-richard1-0/+10
2019-10-05Skip delimiter lines that are not exactly equal to the boundaryJan Engelhardt1-0/+38
There is crap software out there that generates mails violating the prefix ban clause from RFC 2046 §5.1 ¶2. Switch vmime from a prefix match to an equality match, similar to what Alpine and Thunderbird do too.
2019-10-05Disregard whitespace between leading boundary hyphens and markerJan Engelhardt1-3/+3
The way I read the RFC is that whitespace is not allowed before the boundary marker, only afterwards, so the checks for leading WS are removed, and the missing check for trailing WS is added. See RFC 2046 §5.1.1: """The boundary delimiter line is then defined as a line consisting entirely of two hyphen characters ("-", decimal value 45) followed by the boundary parameter value from the Content-Type header field, optional linear whitespace, and a terminating CRLF."""
2019-01-25Improve address parser for malformed mailbox specificationsJan Engelhardt1-6/+6
Spammers use "Name <addr> <addr>" to trick some parsers. My expectations as to what the outcome should be is presented in the updated mailboxTest.cpp. The DFA in mailbox::parseImpl is hereby redone so as to pick the rightmost address-looking portion as the address, rather than something in between. While doing so, it will also no longer mangle the name part anymore (it does this by keeping a "as_if_name" variable around until the end).
2019-01-24tests: add more malformation tests to mailboxTestJan Engelhardt1-4/+19
2018-09-15Removed 'stringProxy' since COW std::string is no longer valid in C++11.Vincent Richard1-31/+0
2018-09-05Code style and clarity.Vincent Richard27-800/+1132
2018-08-18Avoid copy by passing shared_ptr<> with const reference.Vincent Richard3-4/+4
2018-03-12Added unit test related to PR #192.Vincent Richard1-0/+10
2017-01-02Always ignore newlines between words.Vincent Richard1-0/+4
2016-11-05Fixed #149: don't loose charset when fixing invalid broken words.Vincent Richard1-13/+66
2016-03-13Issue #126: more warnings fixed.Vincent Richard1-1/+1
2015-06-07Estimate generated size of parameterized field.Vincent Richard1-0/+140
2015-05-03Fixed parsing of UTF8 email addresses (RFC-2047 local part + IDNA domain name).Vincent Richard1-0/+16
2015-02-19Fixed unit test so that is does not depend on the current locale charset.Vincent Richard1-1/+3
2015-02-16Issue #103: fix badly encoded words.Vincent Richard2-2/+149
2015-01-14Fixed issue #98: support for wrongly padded B64 words.Vincent Richard1-0/+22
2014-06-30Added support for language specification in RFC-2047 encoded words and ↵Vincent Richard3-6/+53
RFC-2231 parameter values.
2014-06-17Allow choosing between encoding modes for parameter values.Vincent Richard1-21/+87
2014-06-06Removed useless 'virtual' inheritance (fixed issue #84).Vincent Richard1-1/+1
2014-06-01Fixed parsing of empty lines in header field value.Vincent Richard1-0/+17
2014-01-15Do not make calls to setlocale() in a library. Use default user locale in ↵Vincent Richard3-0/+72
tests and examples.
2013-12-10Simplified types for better readability. Use appropriate types (size_t, ↵Vincent Richard3-6/+6
byte_t...). Minor warning fixes.
2013-11-21Boost/C++11 shared pointers.Vincent Richard15-174/+178
2013-10-16Fixed group parsing in mailboxList.Vincent Richard1-0/+48
2013-07-11Refactored the way embedded objects are referenced in MHTML.Vincent Richard1-0/+3
2013-06-27Do not QP-encode CRLFs when content type is text.Vincent Richard1-0/+66
2013-06-27Removed debug printf.Vincent Richard1-1/+0
2013-06-26Unit tests for content handlers.Vincent Richard4-0/+600
2013-06-24Added support for SIZE SMTP extension (RFC-1870).Vincent Richard1-0/+57
2013-06-13Added support for transport padding in boundary (issue #38).Vincent Richard1-0/+42
2013-06-09Fixed comment.Vincent Richard1-1/+1
2013-03-25Added support for charset conversion with ICU (thanks to Mehmet Bozkurt).Vincent Richard2-1/+12
2013-03-24Strip spaces at end of header lines (Zarafa).Vincent Richard1-0/+17
2013-03-24Fixed warnings and 64-bit issues.Vincent Richard2-10/+10
2013-03-24Let whitespace break the value of a parameterized header field, not just a ↵Vincent Richard1-0/+12
';' (thanks to Zarafa).
2013-03-24Fixed parsing header field value on next line.Vincent Richard1-0/+17
2013-03-18Updated tests for charset conversion.Vincent Richard3-222/+348
Added test for UTF-7 encoding availability. Added test for input buffer underflow in charsetFilteredOutputStream. Refactored charset conversion tests and removed useless tests.
2013-03-11Fixed mailbox and mailbox group parsing. Added unit tests.Vincent Richard2-0/+115
2013-03-08Refactored unit tests.Vincent Richard17-85/+17
2013-02-27Throw exception when an invalid value type is set in a header field.Vincent Richard1-0/+56
2013-02-25Fixed typo in function name.Vincent Richard1-3/+3