This is a classic case of a 'why' code comment being needed. It's obvious what y...

dmurray · on Aug 16, 2020

Seems like it would be even better to put this in its own function with a descriptive name, ascii_tolower or roman_tolower or whatever, that has exactly the semantics you want.

gregmac · on Aug 16, 2020

This is exactly right, is and is a great example of what self-documenting code can be. The function itself could have a bit more explanation but any code calling it is going to be obvious.

The big difference is it looks deliberate, instead of just code written by someone trying to micro-optimize, be very clever, or who just didn't realize tolower() exists. Most people will pause before just replacing it, and likewise it should trigger questions in the PR.

eitland · on Aug 16, 2020

Still warrants a comment to be sure no one concludes the built in is good enough.

randomdata · on Aug 16, 2020

It certainly warrants a test to document what the function is for.

And if that test also happens to validate that the documentation is accurate, that is a nice side benefit.

h0l0cube · on Aug 17, 2020

> It certainly warrants a test to document what the function is for.

You might be being serious, so I'll indulge. How would a test that will never break, for a function that would never be changed (because look at it), that lives in a different part of the directory structure, be worth even one second extra time or thought to write it, over-and-above the descriptive comment that is longer than the code itself, and lives discoverably right next to the code itself?

olliej · on Aug 17, 2020

Because that's how you stop regressions.

The cost of writing a single test case is not more than the cost of diagnosing what change broke your code for Turkish users.

Now there's always a point where maybe the infrastructure for testing that kind of thing doesn't exist, so writing your one "simple" test case takes a bunch of time, but on the plus side, future similar "simple" test cases will be easy at that point. And no one has to track down why your code is broken in turkey.

Many years ago while I worked on IM/IMEs on Mac and windows I spent maybe a week working on code that allowed an IME to be implemented in JS (within the webkit test harness), so it was possible to test, and prevent regressions that kept on being reintroduced by people changing layout and/or editing code in ways that are "obviously correct" for US/English keyboards. The win from that week is many regressions that were caught before they were even committed, and the ability to completely rewrite the text input system to support IMEs on non-mac platforms.

h0l0cube · on Aug 17, 2020

This isn't exactly addressing my point. Never did I say that 'All tests are useless'. Refer to my answers here:

https://news.ycombinator.com/item?id=24183844

xahrepap · on Aug 17, 2020

I've found that writing a unit test to verify it works at all is sometimes loads easier than manually running the app or whatever.

At that point, just leave the test and check it in.

h0l0cube · on Aug 17, 2020

I agree with this, kind of. Though I've found that keeping useless tests around doesn't always have 0 cost, same for any dead code.

Often, though, you can verify that your code works using a REPL.

I'm also a huge fan of how doc-tests in some languages that make the tests are part of the documentation, and neatly 'cohere' them to the function itself. At which point I'm happy to leave them in, as the tests-as-documentation are harder to miss, and are actually instructive.

scubbo · on Aug 17, 2020

Comments can be ignored, moved, misinterpreted. A test asserts correct behaviour. The only way to make the build fail if someone has (incorrectly) replaced the "complex toLower" with the (incorrect) "built-in toLower" is to delete or change the corresponding test - which rings way more alarm bells than a vague recollection of "hey, didn't there used to be a comment around here that said we shouldn't change this?"

Machine-time to run tests is cheap (if it's not in your codebase, then I agree that your benefit calculus may be different) - human cognition and awareness to prevent mistakes is valuable.

h0l0cube · on Aug 17, 2020

Keep in mind, I'm not advocating for 'no testing, ever'. But surely there's a limit to where you say something is so trivial that it won't ever change. And it's with these preconditions I ask the question:

> a test that will never break, for a function that would never be changed

I'll refer you to the OPs low-tech to-lower function, which is reductively simple, and never should be altered, with a comment as to why.

> Comments can be ignored, moved, misinterpreted.

Don't disagree with this. Someone lacking competence and care may ignore a comment, is incapable of comprehension, or just moves things for no reason. This should be caught at review.

Tests aren't infallible either. They can be invalidated, disabled because someone lacking competence decided it was the easiest way to move forward. This should be caught at review.

Edit: I'll address some of your other points more specifically...

> The only way to make the build fail if someone has (incorrectly) replaced the "complex toLower" with the (incorrect) "built-in toLower" is to delete or change the corresponding test

In OP's precise example, a worse thing can happen. They can shrug their shoulders and just use the built-in directly. Human-context business-value explication has more powerful benefit here.

> which rings way more alarm bells than a vague recollection of "hey, didn't there used to be a comment around here that said we shouldn't change this?"

If no-one's reviewing when someone changes the actual code and it's adjacent comments, who's reviewing the changes to the tests?

> human cognition and awareness to prevent mistakes is valuable.

Extra code comes at a maintenance and cognition cost. Maybe one trivial test seems like a minor cost, but how about the maintenance of 1000s of tests that (ought to) always pass?

sedatk · on Aug 16, 2020

C# has a `ToLowerInvariant()` variety for that.

paranoidrobot · on Aug 16, 2020

Which iirc is an alias for ToLower on the en-us locale. (Same for the other C# *Invariant() methods)

GoblinSlayer · on Aug 17, 2020

It calls ToLower(CultureInfo.InvariantCulture);

Locale name is iv-IV.

lmm · on Aug 16, 2020

Comments are unreliable, you should use tooling to fix this forever. E.g. findbugs has a rule for this problem: http://findbugs.sourceforge.net/bugDescriptions.html#DM_CONV...

kentonv · on Aug 16, 2020

Yeah I probably wrote that comment the first few times I did this but it's hard to write it the 50th time.

Maybe I should have my own tolower() function that I can call so I only have to write the comment once but it just feels ridiculous somehow.

Natsu · on Aug 16, 2020

It's far more ridiculous to repeat yourself over and over instead of making a simple function that describes exactly what you want and why.

viraptor · on Aug 16, 2020

> but it's hard to write it the 50th time.

You know you wrote it 49 times before, but the person reading the code probably doesn't. It only changes your experience not everyone else's.

If it's the same codebase, just write that function :)

eitland · on Aug 16, 2020

Write the function, comment it!

Many of these are obvious to many people here, but some aren't.

Even I can admit that some of the stuff in this thread is not obvious at all.

random314 · on Aug 16, 2020

Why does it feel ridiculous?

kentonv · on Aug 16, 2020

Because I've already rewritten more of the standard library than is healthy.

I mean, it's clearly the right thing to do here but I can predict the conversation that will inevitably result... "You wrote your own tolower() function? Why?" "The standard one is horribly broken." "How could a function that lower-cases a letter be broken??? Jesus Kenton your NIH syndrome is out of control." "Sigh..."

(Slightly more seriously, any particular time I need to lower-case something, it takes 10 seconds to write out the code, but would take 10 minutes to find a good place to define a reusable function and exactly what its API should be, and so it never seems worth the effort in the moment. Just like how most messy code comes to be.)

random314 · on Aug 16, 2020

This conversation can be simply be avoided by copy pasting your original hacker news comment into the library function header.

I have noticed some coworkers have their ego gratified by being right while everyone else is wrong. Instead of simply explaining what they are doing when they are doing it, they will do something that looks wrong in a very noticeable way and wait for the backlash. The backlash gives them an opportunity to show everyone else how they were right while everyone else was wrong and also an opportunity to play victim. However, in SW development - it is not just the technical details - your behavior also matters in a big way.

In this particular case, the correct approach is to create your own library function with appropriate comments. This is why the concept of a library function was invented. It is its entire raison-d'etre. However, you are doing everything but that. Including providing justifications in hacker news comments instead of your source code.

Now inevitably, someone will change your inline code to use to_lower. This will give you an opportunity to scream bloody murder, show how other engineers don't really understand technical details, correct them and also play victim. Create a library utility with comments and link it in - End of story.

saagarjha · on Aug 17, 2020

I’m reminded of garbage collector blog posts where they do something stupid (“disable it”, “allocate a ballast”) and then get to spend a couple pages explaining why it worked for them.

kentonv · on Aug 16, 2020

Speaking of people wanting to gratify their ego by being right: Everyone on this thread trying to lecture me on software engineering? ¯\_(ツ)_/¯

random314 · on Aug 17, 2020

At least, you get to play victim :)

obmelvin · on Aug 17, 2020

Looks like you're only 10 times away from using this to having spent your 10 minutes ;)

But seriously, I'm not here to lecture you, but personally I'd appreciate having a teammate educate me on the undesired behavior and have a nice function I could use to ensure my own code doesn't break user input

nitrogen · on Aug 16, 2020

Most codebases I've worked with have a StringUtils.java, or .kt, or a str.c or utils.c. Maybe just start one. Interestingly I haven't needed it as much in Ruby.

But I too feel the cognitive (and social!) burden of introducing a new function. It's not just "where do I put this", but "how do I convince the team I know what I'm doing since 15 years of experience clearly isn't enough and developers (mostly rightly) ignore positional authority and seniority".

Izkata · on Aug 16, 2020

  #include <kentonv.h>

kentonv · on Aug 16, 2020

It's... It's called KJ...

https://github.com/capnproto/capnproto/tree/master/c++/src/k...

lathiat · on Aug 17, 2020

This project has the greatest sales pitch I've ever seen: https://github.com/capnproto/capnproto/blob/master/README.md -- combined with the project name it's just perfect.