Talk:Text file

This article is written in British English, which has its own spelling conventions (colour, travelled, centre, defence, artefact, analyse) and some terms that are used in it may be different or absent from other varieties of English. According to the relevant style guide, this should not be changed without broad consensus.

Computing High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
High	This article has been rated as High-importance on the project's importance scale.

Merge with plain text[edit]

I agree that this should be merged with plain text. I think the result should be named "plain text", but the contents should mostly or completely come from this page text file. --NealMcB 17:30, 25 April 2006 (UTC)[reply]

I disagree. These two terms completely differ, a text file being a file comprised of text alone, and plain text being a snippet of usually unformatted text (that can come inside any type of document, e.g. a presentation or a data sheet). Obviously, their differences should be clarified by correcting their definitions in their respective pages. PaF 21:41, 13 May 2006 (UTC)[reply]

Yes 'plain text' and 'text file' are not synonymous, but that doesn't mean they shouldn't be merged. The material of this article should be fitted into plain text, and 'text file' should be a sub-section thereof which simply says something like: 'a text file is simply a file that contains text data without any binary data.' If someone searches for 'text file', they should be directed to plain text. --Apantomimehorse 01:23, 5 July 2006 (UTC)[reply]

I also disagree. 70.58.81.57 22:47, 4 July 2006 (UTC)[reply]

I support the merge; the concepts are close enough that the distinction can be explained within one article. --Gerry Ashton 02:35, 1 August 2006 (UTC)[reply]

I object to the merge. While there is a degree of overlap in the terms, they are distinct enough to warrant separate articles. A text file can be plain text, RTF or HTML for example. Plain text files are text files but not all text files are plain text. Dread Lord CyberSkull ✎☠ 12:21, 10 August 2006 (UTC)[reply]

I'm going to take the merge tags off, as there seems to be more opposition to the merger than agreement and there's an awful backlog of articles to be merged. Let's get this off the list. Jaye 13:55, 24 August 2006 (UTC)[reply]

.txt article says around the same informations. 16@r 00:05, 24 October 2006 (UTC)[reply]

I also object. A text file can contain plain text, but they're two very different entities. To agree with an above poster, a text file can contain plain text or it can contain RTF or HTML. Also, from the plain text article, it is noted that plain text is only usually stored in files. It don't have to be. Alex Peppe 00:31, 2 January 2007 (UTC)[reply]

I also object. Plaintext may be transmitted through a network connection or held in memory. A text file refers to plaintext *stored* on a disk or tape. --Nil0lab 03:10, 24 March 2007 (UTC)[reply]

I object, because just now I have trouble classifying text with my program that uses file (Unix). Structured text, such as XML has a lot of non-textual tags, while plain text (per definition) hasn't. Otherwise, I agree with PaF, Apantomimehorse, 70.58.81.57, Dread Lord CyberSkull, Alex Peppe and Nil0lab. Said: Rursus ☺ ★ 17:19, 25 June 2007 (UTC)[reply]

I disagree. A protocol using plain text is different to a text file (a protocol doesn't necessarily transfer files). 83.254.215.231 (talk) 12:21, 21 January 2008 (UTC)[reply]

binary (software)[edit]

Hi, I don't think you should move binary (software) to binary (computing) because it actually discusses binaries, ie compiled applications, whereas binary (computing) sounds like it's going to discuss how computers use 1s and 0s... Evercat 00:35 2 Jun 2003 (UTC)

First my apology to have forgot discussing naming issue first at talkpage. I think binary file is more accurate term. What do you think? -- Taku 00:47 2 Jun 2003 (UTC)

If this article will predominantly be talking about binaries (as opposed to source code or any other form of text file), then I concur: binary file is a better name. -- Wapcaplet 00:50 2 Jun 2003 (UTC)

No, actually I think we need a broader article. After knowing plain text has more about the characterstic of binary file, we may want to have a combined article probably called binary and text file or something. Any thought? -- Taku 00:55 2 Jun 2003 (UTC)

We probably do need a broader article. There isn't too much to say about binary files (aside from the fact that they're not human-readable, for whatever reason). In this context, the word "binary" is more of a piece of terminology, rather than a person/place/thing that needs an encyclopedia article. If anything, it should maybe be incorporated into File format or some related article. Some file formats are considered binary, yadda yadda. -- Wapcaplet 01:02 2 Jun 2003 (UTC) (Though, "human-readable" is pretty vague. Some humans, myself included, are capable of reading binary files and occasionally understanding them :) -- Wapcaplet

I was thinking of what title might be good. As people know, I tend to merge small articles into one big article because I believe Wikipedia is not a dictionary, so we don't want an article that just defines the title of the article. I am not sure file format is a good article to take about a distinction between binary and text files. Actually I want to dicuss for example fopen function of C, which you need to specify a file is binary or text. I mean this topic distinction between binary and text can be expanded a lot more. So after all, to avoid making one big article, an independent article called binary and text files seems fine. Any other idea? -- Taku 01:20 2 Jun 2003 (UTC)

You're probably right. Evercat 01:22 2 Jun 2003 (UTC)

Sounds OK. Personally I'd stick it under File format, but you make a good point that there is the need to distinguish between these two broad classifications of file types. Some ideas:

Text files almost always refers to strict ASCII, though I suppose any information which can be interpreted according to some standardized character code (unicode, UTF, or whatever) would qualify. Informally, as has already been established, usually means "human readable," though there are times when ASCII can be used for other things (ASCII art, for instance) which doesn't really fall under the 'readable' category. Another way of looking at it is that you don't need any special software to view them (though, the definition of 'special software' could be hairy. I get sick of trying to explain to people that you don't need to have Dreamweaver in order to edit an HTML file! :)
Binary files could be practically anything. As already pointed out, all files on a computer are binary in the strictest sense (text files are just special cases). Binary can be compiled executable code, object code or libraries, images, media such as audio or video, ZIP archives, or you name it.
In the context of downloading software, you often see "source code" versus "compiled binary executable", which is another apt analogy.

-- Wapcaplet 01:39 2 Jun 2003 (UTC)

Hi. Um, I have a slight problem with this article because it is rather Unix-centric. In Unix systems, (traditionally), there was a very clear distinction between text files and binary files primarily owing to the ASCII standard, ie: by (unix) definition, a file couldn't be a text file if it contained any character with a byte value over 127. Under Macintosh, (and Windows???) systems, an extended, 256 character encoding was always used. It was completely accurate on a Macintosh to refer to a file as text so long as it was human readable. Today, the point is perhaps mostly moot, as Mac OS X is now Unix-based, and Unicode has become the standard, but I think it still confuses Mac and Windows people today when a Unixer talks about text files as being different from, say, a file that makes use of curly quotes or other high-bit characters in a particular encoding. AdmN 18:23, 30 Aug 2004 (UTC)

Hmm, in my experience, Unix doesn't make this confusion at all. A file with only text data is a text file; everything else is not. Perhaps you are referring to the Unix utilities that try and infer whether a file is plain text or not, for it's true that these are historically ASCII-centric. More up-to-date utitlies don't make this mistake, nor is any text/binary recognition distinction built-in to the OS proper.

In fact, I would say that Windows makes the much worse confusion, for the binary/text option of the C file functions has no affect in Unix, but it does in Windows; all 'text mode' really does, however, is automatically convert any line feed byte into a carriage return followed by a line feed when writing (and the reverse when reading). (I don't know about Macs, but I'm guessing it converts line feed to carriage return and vice versa.) --Apantomimehorse 01:39, 5 July 2006 (UTC)[reply]

Requested move[edit]

Text files → Text file – {use singular form as per std}

Add *Support or *Oppose followed by an optional one-sentence explanation, then sign your vote with ~~~~

Support Singular is more encyclopedic, I feel. UrbaneLegend 23:01, 17 February 2006 (UTC)[reply]

Done, for standardization. Rd232 ^talk 22:17, 18 February 2006 (UTC)[reply]

Disputed[edit]

Confusing!![edit]

The article defines text files as approximately plain text, but this is essentially wrong. A text file contains text that is intended for human information transfer as opposed to binary or data files that for most parts will remain unknown for the ordinary user. This means that

MS Word produces text files.
A yet more confusing example is HTML-files, who cannot be called plain text, but who use to be regarded as text files.

The article must be enhanced to treat structured text beside plain/flat text. Said: Rursus ☺ ★ 19:50, 25 June 2007 (UTC)[reply]

Besides, having MIME in the article contradicts the intro. MIME is a somewhat structured text. Said: Rursus ☺ ★ 20:10, 25 June 2007 (UTC)[reply]

MS Word doesn't produce text files, since doesn't produce files that are human-readable without special software (unless you want to look through all the binary blobs for the occassional bit of your text.) Happysmileman (talk) 19:32, 6 December 2007 (UTC)[reply]

Analysis of intro[edit]

A text file (or plain text file) is a computer file which contains only ordinary textual characters with essentially no formatting. [a text file is a file intended to be human readable, not computer readable (binary)] The term 'text file' is typically used in contrast with the term 'binary file', even though any file is fundamentally a sequence of arbitrary bits, and many computer components (for example, all hard disk circuitry and most system software) make no distinction between file types. that's confusing! However, a large percentage of application programs can understand and use text files in some way, but few programs can typically understand and use the contents of any particular binary file. Hence the distinction can be useful to computer users. this misses the point – the point is that since text files are intended for humans to read, not computers, data loss and data confusion hurt much less. Plain text is just the readability extreme, structured text can either be processed to look nice, or the structurals can be removed, and the purpose is still not lost. Said: Rursus ☺ ★ 20:05, 25 June 2007 (UTC)[reply]

Evil rewrite made[edit]

Yihiheee!! (Giggering evilly, twirling the moustaches)! I simply rewrote the intro to refer to human text information files. I know there's a conflict between three different interpretations of text files:

text file = plain text file - doesn't contain control characters, but may contain newline characters,
text file - is intended for humans to read,
text file - is readable by any unspecialized text editor, and may be compiled or interpreted by a programming language.

So in essence my change was too drastic, and if also the original meaning is reinserted beside mine, I will be happy too. Said: Rursus ☺ ★ 20:59, 25 June 2007 (UTC)[reply]

tag for rewrite ;; what is going on with this article?[edit]

There is some seriously dubious content going into this article, and it is consequently tagged for re-write. It may be suitable to revert this to a previous version, but something needs to be done.

For example:

   A text file is a file intended for humans to read, so it mainly 
   contains character data that can be processed to display a readable 
   text in any natural language.

Where does this definition come from? This whole "intended for humans" definition sounds vague, unencyclopedic and pointless. Assembly programmers are humans also, blind people are humans also; and what with the "red on green" coloring in the article body? Can someone show where this formatting is recommended under WP style guidelines?

Please, have some citations and reliable sources nearby when making substantial modifications to this article. They are desperately needed. dr.ef.tymac 23:57, 25 June 2007 (UTC)[reply]

Basic cleanup done: Initial cleanup has been done. This is a start on cleanup, but the article still needs attention. Please: do not add definitions or substantial revisions unless you can back it up with citations to reliable sources. Thanks. dr.ef.tymac 00:54, 26 June 2007 (UTC)[reply]

It's me! Your objections are highly relevant, since I felt that redefining in the direction I proposed was taken too far. However: the original text seemed quite doubtful to me, because it was in disaccord with the merging debates between plain text and text file, where many opinions objected the merger on the basis of structured text (f.ex. XML). The problem is that there is an ambiguity in the term text file. I think there is no official definition on what is a text file. One definition regards usage of control codes 0x00..0x1F and 0x80..0x9F within the file, the other regards the usage of the file. Both definitions have limitations and cases when they are absurd, such as for a highly control code tagged human text (f.ex. MS-Word DOC) (which is then binary and textual), the usage definition becomes absurd, and for a plain text XML code, which don't use control codes but yet is pretty unreadable and heavily tagged. The trouble is that "text file" is ambiguous and that the article must reflect this ambiguity.

I adher to your stand point that the text shouldn't be touched without adding citations – with one exception: if anyone objects to my changes and wish to restore some of the former text beside the current (referring to the text before, say, 24 June 2007) – it's better than OK by me! Now, I'm going to improve by finding the citations that we wish to add. Said: Rursus ☺ ★ 11:26, 26 June 2007 (UTC)[reply]

BTW: wiktionary treats "text file" as "human readable relatively unformated text" and later on "not being binary", and "being distinguished from word processing files". The trouble is structured texts, which are really regarded as text files. Wiktionary is a tertiary source and cannot be used for citations, but it gives a preliminary hint on where to go from here. Said: Rursus ☺ ★ 11:42, 26 June 2007 (UTC)[reply]

I agree with your basic point that "text file" has ambiguity. To be blunt, I don't personally like the term very much, but it is sufficiently widespread and common to be notable and citable, and hence this article gets to stay instead of being summarily deleted.

Nevertheless, because the term is so ambiguous, reliable cites (to sources that fully acknowledge and understand the inherent ambiguity) are the only authoritative solution here. This is why the "human-readable" definition lacks merit. With all due respect to the contributors to Wiktionary, the term "human-readable" is worse than meaningless. Unless you come from a planet where computers program themselves, and there are no humans who do that work, then even many "binary" files are intended to be "human-readable" at least at some point or another.

Bottom line: thanks for clarifying, and for helping to clear up some of the problems with the ambiguity. I think the best way to proceed is for us to resolve the ambiguity (and debates) by requiring contributors to start adding more cites. Good job on the work you've done so far to bring these issues to light. dr.ef.tymac 15:00, 26 June 2007 (UTC)[reply]

You don't like the term text file, and I may in a sense agree, because the computer science is so full of confusions for perspectives from various actors: the customers, the programmers boss, the programmer and the end user. "Text file" as an official term might not exist, or it does differently according to various technical committees in the Anglosaxon world. Then as a linguistic compound, it certainly exist, but then it means what "text" and "file" infers to us, at the same time that it has a de facto usage (probably technical) that is as valid as the linguistic compound. This is the eternal struggle for being understood we compsciers all the time must fight, till we've invented our own separate language. Said: Rursus ☺ ★ 08:22, 27 June 2007 (UTC)[reply]

Sources of variable quality[edit]

Please add everything you find here (!):

"wiktionary.org on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), confusion on def'on'usage and on def'on'tech-criteria;
"The Jargon File (version 4.4.7) on "text"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: primary? (1?), only uses def'on'tech-criteria, and another, for me unknown, meaning;
"Webopedia on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), very vaguely distinguishes between files containing text, and files only using ASCII (obsolete, but generalize to any character encoding);
"MSN encarta on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: secondary (2), only uses the "contains only alphanumeric characters" definition (which benevolently must also be interpreted to include space, parentheses and interpunction);
"lookwayup.com on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), uses a defn declaring only ASCII to be used, including "formatting instructions";
"Foldoc.org on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: primary (1), says that text files don't contain "invisible" control characters; contrasts with rich text, binary file, flat file.

Sample usages: "MAVID file input description". {{cite web}}: Cite has empty unknown parameter: |1= (help) Kidisk //musikk —Preceding unsigned comment added by 88.91.88.113 (talk) 21:05, 17 July 2010 (UTC)[reply]

the criticism is distracting[edit]

I was distracted by the criticism in the document, eg. citation needed, vague... I'm lazy right now, maybe when I get home I will fix the criticisms.

--146.145.210.126 12:55, 20 July 2007 (UTC)[reply]

Stupid image[edit]

The current image accompanying this article should be replaced with something non-stupid. —Preceding unsigned comment added by Radishes (talk • contribs) 2007-08-03 20:59:26

Cool. Fire up your favorite SVG editor and create a better one, if the image is so bad this should be a piece of cake. dr.ef.tymac 02:25, 4 August 2007 (UTC)[reply]

Highly questionable wording in intro[edit]

Sourced or not, this is nonsense:

"text files are intended to be viewed or interpreted by application software, whereas binary files are executable by the operating system."

Just to prove my point: Blender .blend files, MS Word DOC files, JPG, PNG, GIF, TIFF, OGG, MP3, WAV, AIFF, etc are all examples of "binary files" which are "intended to be viewed or interpreted by application software".

Meanwhile, MS-DOS .bat files, Unix shell scripts, and programs written in Perl, Python, BASIC, and other interpreted languages are examples of "text files" which are "executable by the operating system". The footnote about "source code" doesn't alter this fact: compilation or interpretation often happens in RAM and often no binary file is created in the process.

So, this is a totally wrong distinction to make.

Also, note [4] is NOT a source for this statement, it's an explanatory footnote.

What distinguishes "text files" from "binary files" is that the byte stream in a text file has a simple, unambiguous mapping to a sequence of characters which may be rendered as human-readable glyphs, arranged in a simple human-readable form.

We do use application software to do this rendering, but that is equally true of many binary files, so it is not a distinctive property of text files. The definition of text file is also tightly linked with the concept of a "text editor", which is an application specifically designed to manipulate text files. Indeed, a good definition of a text file is "a file which may be easily processed using a text editor".

"Plain text", though it probably has more than one distinct meaning (and probably therefore deserves a disambiguation?), in this context, means a text file which furthermore does not contain special formatting instructions (unlike XML or HTML, for example), usually called "markup". Thus, "plain text" contains little or no structure (this is fuzzy because paragraph breaks, newline characters, and setting headings off by empty lines can all be thought of as exceptions).

The term "text file" actually dates from a time when it was essentially synonymous with "ASCII encoded file", but the rise of other encodings, and particularly Unicode has stretched the meaning by making what we think of as "text files" more complex and less unambiguous. But even so, there is a clear distinction between a straightforward representation of text and a rich-text or page-description language which contains complex formatting information. Digitante (talk) 14:32, 11 February 2008 (UTC)[reply]

I agree that statement was very misleading. It couldn't be allowed to stand, so I removed it. The intro needs to be enlarged now, I'd guess, but I'm not prepared to do that; feel free. Note that there are plain text, plaintext, etc. articles existant. -R. S. Shaw (talk) 21:32, 11 February 2008 (UTC)[reply]

End-of-file marker?[edit]

The article says:

"The end of a text file is often denoted by placing one or more special characters, known as an end-of-file marker, after the last line in a text file."

This is incorrect information supported by a [weasel word]. Most file systems don't use the concept of "end-of-file" marker, and most systems definitely don't use a special marker for the last character in a text file.

It is arguable, on the other hand, wheter the last line of a text file is or isn't ended by a newline marker (whichever it is, CRLF, CR or LF). But the newline marker is definitely not a marker for the end of the file.

-- Rgiusti (talk) 15:19, 9 August 2012 (UTC)[reply]

Sentence lacking grammar[edit]

"According to Unicode Microsoft protocol for txt files use UTF-8." I cannot parse this sentence. Can someone who knows what it tries to say, make it meaningful? — Preceding unsigned comment added by 94.224.53.151 (talk • contribs) 21:02, 7 August 2013 (UTC)[reply]

Note that as of 2018, while Microsoft *claims* to use Unicode, most, if not all, of its compilers use a sub-set of it. Microsoft (as of early 2018, the last time I checked) is inconsistent in its Unicode usage. Microsoft's implementation of UTF-8 is NOT 100% compliant with the standard.72.16.99.93 (talk) 05:40, 25 November 2018 (UTC)[reply]

In any case, the offending sentence has disappeared. I guess what replaced it, or replaced the paragraph containing it, is:

Most Windows text files use "ANSI", "OEM", "Unicode" or "UTF-8" encoding. What Windows terminology calls "ANSI encodings" are usually single-byte ISO/IEC 8859 encodings (i.e. ANSI in the Microsoft Notepad menus is really "System Code Page", non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Windows, before the transition to Unicode. By contrast, OEM encodings, also known as DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in DOS applications. "Unicode"-encoded Windows text files contain text in UTF-16 Unicode Transformation Format. Such files normally begin with Byte Order Mark (BOM), which communicates the endianness of the file content. Although UTF-8 does not suffer from endianness problems, many Windows programs (i.e. Notepad) prepend the contents of UTF-8-encoded files with BOM,^[1] to differentiate UTF-8 encoding from other 8-bit encodings.^[2]

And I think Windows has traditionally been more UCS-2/UTF-16-oriented for Unicode text; "Microsoft's implementation of UTF-8 is NOT 100% compliant with the standard." may reflect their lack of strong interest in supporting UTF-8. They may be improving their UTF-8 support now, as Windows systems have to deal with it more (in network protocols and when exchanging data with UN*X systems). Guy Harris (talk) 06:18, 25 November 2018 (UTC)[reply]

References

^ "Using Byte Order Marks". Internationalization for Windows Applications. Microsoft. Retrieved 2015-12-15.
^ Freytag, Asmus (2015-12-18). "FAQ – UTF-8, UTF-16, UTF-32 & BOM". The Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature — an indication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.

Curious observation[edit]

I happened to notice today a peculiar dogma or "double-standard" that is being implicitly asserted in this article with regard to textual/string representations. On the one hand, a "modern" OS is said to no longer require end-of-file markers, seemingly equating progress to this feature. However, on the other hand, lines themselves are still typically ended with new line character(s), so it would seem that this "anachronism" survives at the line level of detail. To my knowledge, this would be due to the difference in the abstractions themselves. File lengths, being the domain of the OS, are apparently more "modern" than the file formats, which are effectively invisible to the OS.

All in all, to me, this seems to presume a bit of glibness within the article in recognizing the trifling optimization at the OS-level, while ignoring the bigger potential optimization at the line-level. I'm not sure what to make of this, except that I find this observation interesting, and would prefer that the article not be so dogmatic. 75.139.254.117 (talk) 04:22, 20 November 2016 (UTC)[reply]

Well, end-of-file markers are redundant because the information about file size is otherwise stored in file metadata, and this information is required there because of different properties of modern file systems. File metadata does not store the information about individual lines within a file, mostly because there is no good use for such information outside text file editing/displaying, in which case significant portions of the file would be read anyway. Of course such metadata could be store in some file system, but then associated data structures would end up taking more space then single LF byte, which is hardly an optimization. So basically the article is right in recognising deprecation of explicit EOF markers but not of explicit EOL markers. — Dmitrij D. Czarkoff (talk•track) 18:02, 27 June 2017 (UTC)[reply]

Translation: EOF information is not necessary because start of file (or the current segment of the file) as well as its size (or location of its end, which is equivalent) is available external to the file. While a line may consist of zero, one or 100 thousand characters. For instance, Microsoft Notepad doesn't insert line breaks in lines that 'run off' the screen. That is quite appropriate, usually. The only way to know where a displayed line will (or should) break is to know both the details of the display device and the details of the font to be used to display it. Neither is generally available to the file.72.16.99.93 (talk) 08:25, 25 November 2018 (UTC)[reply]

Move discussion in progress[edit]

There is a move discussion in progress on Talk:CTXT (media) which affects this page. Please participate on that page and not in this talk page section. Thank you. —RMCD bot 10:45, 13 May 2018 (UTC)[reply]

Wrong on many levels.[edit]

I have serious issues with the article. It generally assumes that a text file is intended for display. That's just not true. Text files are used for a variety of reasons, even when the information is NOT intended for display (or printing). The article claims the file "is" composed of characters. Not really, the file is composed of binary (virtually always) data which "should be" interpreted as computer characters. (Where computer characters include letters (graphemes), digits, punctuation marks, and control characters - what the Unicode Consortium calls 'code points'.) Text files are used generally because they can be easily understood by humans, not because they will be. That is, they might be used to encode information to ensure the quality of the information, or may be used because the interpretation of the contents is straightforward, simple, and (assuming the reader is literate in the underlying language) direct (should that ever be necessary). The fact that most browsers and word processors can easily display the contents of a text file (due to historical precedent) is another reason, but we have to keep in mind that even the simplest display requires a whole lot of computer code to take the binary bits on a magnetic film or charges on a silicon chip and create dots of light on a computer monitor from them. Is that task substantially easier than interpreting a binary file? Not necessarily, in fact interpretation of a binary file may be easier and faster for the computer/electronics than display of a text file. The article is written as if the author believes that these characters actually exist in the file. While a simple way to look at it, and if the audience is composed of middle-school students, it might be an optimum way, perhaps some acknowledgement of the reality wouldn't be too difficult to keep in mind.72.16.99.93 (talk) 08:16, 25 November 2018 (UTC)[reply]

Article fails to acknowledge levels of abstraction.[edit]

Some people here are trying to define what a "text file" actually is, but they struggle because they don't acknowledge that a single file can be more than one thing. I have a certain file on my computer:

It is a text file,
It is an XML file,
It is a Scalable Vector Graphics (SVG) file, and
It is an Inkscape document.

When I say that it is a "text file," What I mean is, it makes sense, under some circumstances, to open the file in a "plain text" editor (a.k.a., "programmer's editor"). The Inkscape document format is defined as annotated SVG, SVG is defined as an XML application, and XML is defined as plain text that obeys certain syntax rules. That's four distinct levels of abstraction, and that's without even broaching the subject of what the document looks like when rendered as SVG.

I also have an .xhtml file. That's even more fun to describe because it is text on more than one level: It's text, represented as XHTML, which is a form of XML, which is represented as plain text. 173.75.33.51 (talk) 19:30, 31 August 2020 (UTC)[reply]

Infobox is inaccurate and should be moved[edit]

Guy Harris The article refers to a "text file", which is any file that represents text. However, the infobox refers to the plain text file type, which semantically represents text files containing plain text. This is obvious in the MIME type, which the infobox says is "text/plain", even though the scope of the actual article is about all text files, which includes any "text/*" file. PBZE (talk) 18:30, 28 November 2021 (UTC)[reply]

"Plain text" is not a file type. "Plain text" is a type of text, whether it's in a file or not; for example, the control channel for FTP is plain text, but it's rarely written to a text file, and the content of a pcapng capture file comment option is usually plain text, but pcapng files aren't text files.

So if the infobox should be moved to plain text, it should be moved to a section that discusses files that contain only plain text, i.e. plain-text files. Guy Harris (talk) 19:09, 28 November 2021 (UTC)[reply]

Fundamental misunderstanding[edit]

All computer files are binary – EVEN “TEXT” FILES. As stored on the hard-drive or in memory “TEXT” files (such as hello.txt or mydata.json) consist only of bits (0’s and 1’s). The reason we see text in them and can read them is that we typically open them in applications such as notepad, Word, etc... that can display the bits as text. The application makes the file readable. Without these and similar applications, a “TEXT” file (such as hello.txt or mydata.json) would be indistinguishable to the human eye from binary data; they would be unreadable. — Preceding unsigned comment added by 131.119.15.14 (talk) 18:13, 8 March 2022 (UTC)[reply]

Yes, that's what the binary file page says.:

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

The best way to think about the distinction between a "text file" and a "non-text file" is that a "dumb" program that just sends characters to a terminal would correctly display a "text file" but would not correctly display a "non-text file". Guy Harris (talk) 21:07, 8 March 2022 (UTC)[reply]

[1] "Using Byte Order Marks". Internationalization for Windows Applications. Microsoft. Retrieved 2015-12-15.

[2] Freytag, Asmus (2015-12-18). "FAQ – UTF-8, UTF-16, UTF-32 & BOM". The Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature — an indication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.

[1]

[2]