C's most infamous parsing difficulty is: A * B; Is it A multiplied by B, or B de...

omegalulw · on March 17, 2022

I don't see why resolution without a "symbol table" is a big deal. Knowing whether A is a type or not resolves these fairly easily. And in well written code, it should generally be obvious whether A is a type or not so it should not be a readability problem either.

WalterBright · on March 17, 2022

Because a quite different AST is built for the second case depending on whether A is a type or not. And you can't tell whether A is a type or not without a symbol table.

xscott · on March 17, 2022

I think the two of you are talking past each other. You're saying, "yes, you need a symbol table", and he's saying, "yes, but a symbol table isn't very hard to do".

Personally, I wonder how many of these things would go away if pointers were a suffix operator with a different character (maybe the @ sign), and if casts looked like function calls.

   A B@;   # B is a pointer to type A
   A(-B);  # A is either a function or a type for casting
           # Same AST regardless

WalterBright · on March 17, 2022

As for symbol tables being hard to do, notice that C does not allow forward references. Supporting forward references while relying on the symbol table to drive the parse winds up with unresolvable problems.

xscott · on March 17, 2022

I'm not sure I understand your point. I don't think we're disagreeing about anything.

C needs forward declarations for some things, and it needs a symbol table to resolve some parts of the grammar. All I was saying was that I think you could resolve both of them with minor changes. (I see that D has a "cast" keyword, and that's obviously one way to do it.)

esfandia · on March 17, 2022

> A B@; # B is a pointer to type A

IIRC that's how Pascal did it, using a caret ^ for denoting a pointer.

benibela · on March 17, 2022

Kind of.

A pointer to type A is:

     var B: ^A

The parser knows it is a declaration because of the var, and it knows ^A is the type, because of the colon. That it is a caret does not really matter here

xscott · on March 17, 2022

Yeah, now only if Pascal had used curly braces, all could be right in the world :-)

WalterBright · on March 18, 2022

I don't know about Pascal, but AT&T allowed anyone to write a C compiler without needing a license. (C++, too, I know because I asked AT&T's lawyers.)

ksec · on March 17, 2022

Arh.... But then it wouldn't be Pascal ><

peterashford · on March 17, 2022

Yes! As he said - all would be right with the world =)

zerr · on March 17, 2022

It was more about AT&T monopoly I believe :)

PoignardAzur · on March 17, 2022

It's a problem because it means your editor will have a very hard time parsing your file without analyzing a lot of other files first.

The grammar of your file depends on previously declared symbols. But which symbols have been declared depends on the header.h files you import. But which headers you import depends on the -I options you give to your compiler. Except there's no standardized way to express what -I options your project uses, and they might change depending on your build profile.

Any modern text editor can give you good syntax highlighting for a Rust file or a Go file basically as soon as it opens. When it opens a C/C++ file, it has to do a lot of guessing.

(This is not conjecture, by the way. I tried to integrate clang's implementation of the Language Server Protocol in Atom for my end-of-studies project, and it was not fun.)

bodhiandphysics · on March 17, 2022

It’s annoying when parsing c. Also the situation is worse in c++ since * can have side effects!

WalterBright · on March 17, 2022

You're right, and C++ is hopeless to parse without a symbol table in other ways. Later versions of C++ added keywords to disambiguate, but of course there's legacy code.

choeger · on March 17, 2022

Because pretty much the whole science of parsing is built for context-free languages. Yes, of course, you can ignore science when you do your thing, but please don't call yourself an engineer, then ;).

sharpneli · on March 17, 2022

And as an engineer you can slap on a symbol table and it works like a context free language after that.

cryptonector · on March 17, 2022

Mainly because most parser generators don't have a feature for disambiguation based on the application providing hints based on symbol table lookups.

samhw · on March 17, 2022

It seems like you maybe misread their comment as referring to human parsing of code, rather than writing a parser program? I can’t make much sense of this otherwise.

stevefan1999 · on March 17, 2022

This is because there would be extra memory involved, and this could make it hard to deploy on low memory environment. Consider the historical background of C, in the late 70s' to 90s' and you'll see why.

But thanks to Moore's Law and the hedge by the Writh's Law, you have significantly more powerful hardware yet not so faster software, because we started to deploy languages worse than C: C++. C++ with template, is Turing Complete, and this means every template expansion is potentially undecidable, meaning it could run forever, where most compiler turned a blind eye to by putting a "recursion depth". Having templates, even without "recursion depth" alone consumes even more memory than C, and so that's why we complaint about C++ compilers are memory boggling, while C has (relatively) low memory overhead. It's just that times are different and the perception about memory use is not that apparent anymore.

kaba0 · on March 17, 2022

The problem has nothing to do with extra memory, it’s just that C parsers are more hacky than others due to the C language not having a proper context-free grammar.

Also, I frankly don’t get why templates are “bad” according to you. They allow for proper metaprogramming not relying on ugly hacks like memory layout conventions, etc.

stevefan1999 · on March 18, 2022

I'm not saying templates are bad, well maybe my wordings are ambiguous but I do find template metaprogramming too complicated to begin with. Yes I'm fascinated by many things TMP made possible (such as std::tuple, std::bitset, Boost.ASIO, Boost.Hana), but also because of this, to this date I still cannot commit my idea of creating my own C++ compiler.

relaxing · on March 17, 2022

It’s both.

mhh__ · on March 17, 2022

You need those symbols tables anyway though

Someone · on March 17, 2022

> But there's another way that works very well: it's a declaration. The reason is simple. The multiply has no purpose, and so nobody would write that.

But there's another way that works very well: it's a multiplication. The reason is simple. A declaration of a variable that isn’t used has no purpose, and so nobody would write that.

As I think you know, C doesn’t handle this case by guessing that it must be a declaration. Its lexer looks in the tables that its parser creates to check whether a type called ‘A’ is in scope (https://stackoverflow.com/questions/41331871/how-c-c-parser-...)

kazinator · on March 18, 2022

> The multiply has no purpose, and so nobody would write that.

A fellow named Bjarne Stroustrup fixed that bug, though. In the plus plus dialect of C, A * B could reboot your system, without any #define macros for A or B.