Metadata-Version: 2.4 Name: binaryornot Version: 0.6.0 Summary: Ultra-lightweight pure Python package to check if a file is binary or text. Project-URL: bugs, https://github.com/binaryornot/binaryornot/issues Project-URL: changelog, https://github.com/binaryornot/binaryornot/releases Project-URL: documentation, https://binaryornot.github.io/binaryornot/ Project-URL: homepage, https://github.com/binaryornot/binaryornot Author-email: Audrey Roy Greenfeld Maintainer-email: Audrey Roy Greenfeld License: MIT License-File: LICENSE Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Natural Language :: English Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Programming Language :: Python :: 3.14 Classifier: Typing :: Typed Requires-Python: >=3.10 Description-Content-Type: text/markdown # BinaryOrNot Python library and CLI tool to check if a file is binary or text. Zero dependencies. ```python from binaryornot.check import is_binary is_binary("image.png") # True is_binary("README.md") # False is_binary("data.sqlite") # True is_binary("report.csv") # False ``` ```sh $ binaryornot image.png True ``` ## Install ```sh pip install binaryornot ``` ## Why not just check for null bytes? That's the first thing everyone tries. It works until it doesn't: - A UTF-16 text file is full of null bytes. Your tool thinks it's binary and corrupts it. - A Big5 or GB2312 text file has high-ASCII bytes everywhere. Looks binary by byte ratios alone. - A font file (.woff, .eot) is clearly binary but might not have null bytes in the first chunk. BinaryOrNot reads the first 128 bytes and runs them through a trained decision tree that considers byte ratios, Shannon entropy, encoding validity, BOM detection, and more. It handles all the edge cases above correctly, with zero dependencies. Tested against [37 text encodings and 49 binary formats](https://binaryornot.github.io/binaryornot/usage/), verified by parametrized tests driven from coverage CSVs. ## API One function: ```python from binaryornot.check import is_binary is_binary(filename) # returns True or False ``` There's also `is_binary_string()` if you already have bytes: ```python from binaryornot.helpers import is_binary_string is_binary_string(b"\x00\x01\x02") # True is_binary_string(b"hello world") # False ``` [Full documentation](https://binaryornot.github.io/binaryornot/) covers the detection algorithm in detail. ## Credits Created by [Audrey Roy Greenfeld](https://audrey.feldroy.com).