Seems like it would be even better to put this in its own function with a descriptive name, ascii_tolower or roman_tolower or whatever, that has exactly the semantics you want.
This is exactly right, is and is a great example of what self-documenting code can be. The function itself could have a bit more explanation but any code calling it is going to be obvious.
The big difference is it looks deliberate, instead of just code written by someone trying to micro-optimize, be very clever, or who just didn't realize tolower() exists. Most people will pause before just replacing it, and likewise it should trigger questions in the PR.
> It certainly warrants a test to document what the function is for.
You might be being serious, so I'll indulge. How would a test that will never break, for a function that would never be changed (because look at it), that lives in a different part of the directory structure, be worth even one second extra time or thought to write it, over-and-above the descriptive comment that is longer than the code itself, and lives discoverably right next to the code itself?
The cost of writing a single test case is not more than the cost of diagnosing what change broke your code for Turkish users.
Now there's always a point where maybe the infrastructure for testing that kind of thing doesn't exist, so writing your one "simple" test case takes a bunch of time, but on the plus side, future similar "simple" test cases will be easy at that point. And no one has to track down why your code is broken in turkey.
Many years ago while I worked on IM/IMEs on Mac and windows I spent maybe a week working on code that allowed an IME to be implemented in JS (within the webkit test harness), so it was possible to test, and prevent regressions that kept on being reintroduced by people changing layout and/or editing code in ways that are "obviously correct" for US/English keyboards. The win from that week is many regressions that were caught before they were even committed, and the ability to completely rewrite the text input system to support IMEs on non-mac platforms.
I agree with this, kind of. Though I've found that keeping useless tests around doesn't always have 0 cost, same for any dead code.
Often, though, you can verify that your code works using a REPL.
I'm also a huge fan of how doc-tests in some languages that make the tests are part of the documentation, and neatly 'cohere' them to the function itself. At which point I'm happy to leave them in, as the tests-as-documentation are harder to miss, and are actually instructive.
Comments can be ignored, moved, misinterpreted. A test asserts correct behaviour. The only way to make the build fail if someone has (incorrectly) replaced the "complex toLower" with the (incorrect) "built-in toLower" is to delete or change the corresponding test - which rings way more alarm bells than a vague recollection of "hey, didn't there used to be a comment around here that said we shouldn't change this?"
Machine-time to run tests is cheap (if it's not in your codebase, then I agree that your benefit calculus may be different) - human cognition and awareness to prevent mistakes is valuable.
Keep in mind, I'm not advocating for 'no testing, ever'. But surely there's a limit to where you say something is so trivial that it won't ever change. And it's with these preconditions I ask the question:
> a test that will never break, for a function that would never be changed
I'll refer you to the OPs low-tech to-lower function, which is reductively simple, and never should be altered, with a comment as to why.
> Comments can be ignored, moved, misinterpreted.
Don't disagree with this. Someone lacking competence and care may ignore a comment, is incapable of comprehension, or just moves things for no reason. This should be caught at review.
Tests aren't infallible either. They can be invalidated, disabled because someone lacking competence decided it was the easiest way to move forward. This should be caught at review.
Edit: I'll address some of your other points more specifically...
> The only way to make the build fail if someone has (incorrectly) replaced the "complex toLower" with the (incorrect) "built-in toLower" is to delete or change the corresponding test
In OP's precise example, a worse thing can happen. They can shrug their shoulders and just use the built-in directly. Human-context business-value explication has more powerful benefit here.
> which rings way more alarm bells than a vague recollection of "hey, didn't there used to be a comment around here that said we shouldn't change this?"
If no-one's reviewing when someone changes the actual code and it's adjacent comments, who's reviewing the changes to the tests?
> human cognition and awareness to prevent mistakes is valuable.
Extra code comes at a maintenance and cognition cost. Maybe one trivial test seems like a minor cost, but how about the maintenance of 1000s of tests that (ought to) always pass?
Because I've already rewritten more of the standard library than is healthy.
I mean, it's clearly the right thing to do here but I can predict the conversation that will inevitably result... "You wrote your own tolower() function? Why?" "The standard one is horribly broken." "How could a function that lower-cases a letter be broken??? Jesus Kenton your NIH syndrome is out of control." "Sigh..."
(Slightly more seriously, any particular time I need to lower-case something, it takes 10 seconds to write out the code, but would take 10 minutes to find a good place to define a reusable function and exactly what its API should be, and so it never seems worth the effort in the moment. Just like how most messy code comes to be.)
This conversation can be simply be avoided by copy pasting your original hacker news comment into the library function header.
I have noticed some coworkers have their ego gratified by being right while everyone else is wrong. Instead of simply explaining what they are doing when they are doing it, they will do something that looks wrong in a very noticeable way and wait for the backlash. The backlash gives them an opportunity to show everyone else how they were right while everyone else was wrong and also an opportunity to play victim. However, in SW development - it is not just the technical details - your behavior also matters in a big way.
In this particular case, the correct approach is to create your own library function with appropriate comments. This is why the concept of a library function was invented. It is its entire raison-d'etre. However, you are doing everything but that. Including providing justifications in hacker news comments instead of your source code.
Now inevitably, someone will change your inline code to use to_lower. This will give you an opportunity to scream bloody murder, show how other engineers don't really understand technical details, correct them and also play victim. Create a library utility with comments and link it in - End of story.
I’m reminded of garbage collector blog posts where they do something stupid (“disable it”, “allocate a ballast”) and then get to spend a couple pages explaining why it worked for them.
Looks like you're only 10 times away from using this to having spent your 10 minutes ;)
But seriously, I'm not here to lecture you, but personally I'd appreciate having a teammate educate me on the undesired behavior and have a nice function I could use to ensure my own code doesn't break user input
Most codebases I've worked with have a StringUtils.java, or .kt, or a str.c or utils.c. Maybe just start one. Interestingly I haven't needed it as much in Ruby.
But I too feel the cognitive (and social!) burden of introducing a new function. It's not just "where do I put this", but "how do I convince the team I know what I'm doing since 15 years of experience clearly isn't enough and developers (mostly rightly) ignore positional authority and seniority".