Columbia mechanism scientists invent FontCode, a approach to censor dark information in typical content by undiscernibly changing a shapes of fonts in text. Method could forestall request tampering, strengthen copyrights, as good as censor QR codes and other metadata though altering a demeanour or blueprint of a document.

New York, NY—May 10, 2018—Computer scientists during Columbia Engineering have invented FontCode, a new approach to censor dark information in typical content by undiscernibly changing, or perturbing, a shapes of fonts in text. FontCode creates rise perturbations, regulating them to encode a summary that can after be decoded to redeem a message. The routine works with many fonts and, distinct other content and request methods that censor embedded information, works with many request types, even progressing a dark information when a request is printed on paper or converted to another record type. The paper will be presented during SIGGRAPH in Vancouver, British Columbia, Aug 12-16.
“While there are apparent applications for espionage, we consider FontCode has even some-more unsentimental uses for companies wanting to forestall request tampering or strengthen copyrights, and for retailers and artists wanting to censor QR codes and other metadata though altering a demeanour or blueprint of a document,” says Changxi Zheng, associate highbrow of computer science and a paper’s comparison author.
Zheng combined FontCode with his students Chang Xiao (PhD student) and Cheng Zhang MS’۱۷ (now a PhD tyro during UC Irvine) as a content steganographic routine that can censor text, metadata, a URL, or a digital signature into a content request or image, either it’s digitally stored or printed on paper. It works with common rise families, such as Times Roman, Helvetica, and Calibri, and is concordant with many word estimate programs, including Word and FrameMaker, as good as image-editing and sketch programs, such as Photoshop and Illustrator. Since any notation can be perturbed, a volume of information conveyed personally is singular usually by a length of a unchanging text. Information is encoded regulating notation rise perturbations—changing a cadence width, adjusting a tallness of ascenders and descenders, or tightening or relaxation a curves in serifs and a bowls of letters like o, p, and b.
“Changing any letter, punctuation mark, or pitch into a somewhat opposite form allows we to change a definition of a document,” says Xiao, a paper’s lead author. “This dark information, yet not manifest to humans, is machine-readable usually as barcodes and QR codes are now entertaining by computers. However, distinct barcodes and QR codes, FontCode doesn’t blotch a visible aesthetics of a printed material, and a participation can sojourn secret.”
Data dark regulating FontCode can be intensely formidable to detect. Even if an assailant detects rise changes between dual texts—highly doubtful given a refinement of a perturbations—it simply isn’t unsentimental to indicate any record going and entrance within a company.
Furthermore, FontCode not usually embeds though can also encrypt messages. While a perturbations are stored in a numbered plcae in a codebook, their locations are not fixed. People wanting to promulgate by encrypted papers would determine on a private pivotal that specifies a sold locations, or order, of perturbations in a codebook.
“Encryption is usually a backup turn of insurance in box an assailant can detect a use of rise changes to communicate tip information,” says Zheng. “It’s unequivocally formidable to see a changes, so they are unequivocally tough to detect—this creates FontCode a unequivocally absolute technique to get information past existent defenses.”
FontCode is not a initial record to censor a summary in text—programs exist to censor messages in PDF and Word files or to resize whitespace to imply a 0 or 1—but, a researchers say, it is a initial to be document-independent and to keep a tip information even when a request or an picture with content (PNG, JPG) is printed or converted to another record type. This means a FrameMaker or Word record can be converted to PDF, or a JPG can be converted to PNG, all though losing a tip information.
To use FontCode, a user would supply a tip summary and a conduit content document. FontCode translates a tip summary to a bit fibre (ASCII or Unicode) and afterwards into a method of integers. Each integer is reserved to a five-letter retard in a unchanging content where a numbered codebook locations of any notation sum to a integer.
Recovering dark messages is a retreat process. From a digital record or from a sketch taken with a smartphone, FontCode matches any disturbed notation to a strange distress in a codebook to refurbish a strange message.
Matching is finished regulating convolutional neural networks (CNNs). Recognizing vector-drawn fonts (such as those stored as PDFs or combined with programs like Illustrator) is candid given figure and trail definitions are computer-readable. However, it’s a opposite story for PNG, IMG, and other rasterized (or pixel) fonts, where lighting changes, incompatible camera perspectives, or sound or blurriness might facade a partial of a notation and forestall an easy recognition.
While CNNs are lerned to take into comment such distortions, approval errors will still occur, and a pivotal plea for a researchers was ensuring a summary could always be recovered in a face of such errors. Redundancy is one apparent approach to redeem mislaid information, though it doesn’t work good with content given surplus letters and black are easy to spot.
Instead, a researchers incited to a 1700-year-old Chinese Remainder Theorem, that identifies an opposite series from a residue after it has been divided by several opposite divisors. The postulate has been used to refurbish blank information in other domains; in FontCode, researchers use it to redeem a strange summary even when not all letters are rightly recognized.
“Imagine carrying 3 opposite variables,” says Zheng. “With 3 linear equations, we should be means to solve for all three. If we boost a series of equations from 3 to five, we can solve a 3 unknowns as prolonged as we know any 3 out of a 5 equations.”
Using a Chinese Remainder theory, a researchers demonstrated they could redeem messages even when 25% of a notation perturbations were not recognized. Theoretically a blunder rate could go aloft than 25%.
The authors, who have filed a obvious with Columbia Technology Ventures, devise to extend FontCode to other languages and impression sets, including Chinese.
“We are vehement about a extended array of applications for FontCode,” says Zheng, “from request government software, to invisible QR codes, to insurance of authorised documents. FontCode could be a diversion changer.”
—by Holly Evarts
About a Study
The investigate is patrician “FontCode: Embedding Information in Text Documents regulating Glyph Perturbation.”
Authors are: Changxi Zheng, Chang Xiao, and Cheng Zhang(department of mechanism science, Columbia Engineering).
The investigate was upheld in partial by a National Science Foundation.