• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Readability

Underseer

Contributor
Joined
May 29, 2003
Messages
11,413
Location
Chicago suburbs
Basic Beliefs
atheism, resistentialism
This is just intended as a thread for discussing what people consider more readable in code.

I needed to parse a first name out of a cell that contained first and last name in Excel.

Code:
D5 = "John Smith"

So the parse was

Code:
=LEFT(D5,FIND(" ",D5))

But whoops, this resulted in a first name with an extra space tacked on the end. Fencepost error! So which fix would you consider more readable?

Code:
=LEFT(D5,FIND(" ",D5)-1)

Or

Code:
=TRIM(LEFT(D5,FIND(" ",D5)))

Yeah, I know this is pretty fucking trivial, but I'm just curious about the attitudes other people have about readability. See, I'm not a professional programmer by trade. If I write code of any kinds, it's very short and once I write it, I probably won't look at it again for many years, so I tend to overcompensate on the side of clarity and explanations. I put way more comments than a pro would because I fucking need the extra comments.

Of course my preference would be for the TRIM() version just because if I look at this again years from now, I would probably think "What the fuck is that -1 doing there?" Sure, it would only take me a couple of seconds or less to figure out why, but with the TRIM() version it's immediately obvious what's going on and why. At least to me.

And yes, I'm aware that cell formulas in a spreadsheet do not count as "code," but I thought this illustrates a basic principle of readability.

Anyway, if you have any better examples of readability, especially anything that regularly bothers you, please feel free to offer them here.
 
Definitely the trim, it's far more apparent what the purpose is.

However, "readability" and "excel" do not belong in the same sentence!
 
I could go either way, but I think the more normal way to do it would be to minimize using any function calls you don't need to make. In my experience using add-ons like '-1' in find calls is pretty standard, so I think for any non-junior dev they'd be able to pick up on it pretty quickly. Whereas using TRIM is more a consequence of not using the FIND function correctly, rather than a call you actually need to make.

And to me, the reason for your TRIM is just as convoluted as the -1, so either way I'd need to do some deduction. So might as well just minimize your code.

Kind of reminds me of some comments I've heard about the use of 'DISTINCT' in SQL. An old manager of mine used to say that if you need to use DISTINCT your sql should actually be corrected so you don't need to use it at all. I'd think the same type of thing applies here.

That said, more important to readability is consistency. If your entire code-base does it one way or the other, keep doing the same.
 
I would choose =LEFT(D5,FIND(" ",D5)-1) because:
1. it's clear that the "-1" is an offset for the value returned by FIND.
2. it's probably a bit faster than calling TRIM

The hardest part is remembering what LEFT and FIND do in the first place and what arguments they take.

Alternatively, you could define a function called FIRSTWORD().
 
I agree with bigfield - I would probably write a custom function, and I would tend to shy away from adding a TRIM() function call.

That having been said, it is a fundamental truth that names don't comply with programmer's expectations.

Some people's first name is their family name. Is that what you want, in all such cases?

Where does the data come from? User input might well contain more than one space, and in some cases this will be intentional.

Back in the day, when computers were a bit of a novelty, my father received a computer generated letter from his bank. The envelope gave his full name, including various honorifics, above the address.

The letter inside began:

Dear Dr PhD,

I am writing to inform you...​

My firm advice is never to attempt to manipulate names.

Not everyone has one first name. Not everyone has one surname. Not everyone has more than zero of either. Any convention you assume to exist will prove to have exceptions that will deeply offend someone.
 
I could go either way, but I think the more normal way to do it would be to minimize using any function calls you don't need to make. In my experience using add-ons like '-1' in find calls is pretty standard, so I think for any non-junior dev they'd be able to pick up on it pretty quickly. Whereas using TRIM is more a consequence of not using the FIND function correctly, rather than a call you actually need to make.

And to me, the reason for your TRIM is just as convoluted as the -1, so either way I'd need to do some deduction. So might as well just minimize your code.

Kind of reminds me of some comments I've heard about the use of 'DISTINCT' in SQL. An old manager of mine used to say that if you need to use DISTINCT your sql should actually be corrected so you don't need to use it at all. I'd think the same type of thing applies here.

That said, more important to readability is consistency. If your entire code-base does it one way or the other, keep doing the same.

Good point. This was in a cell formula, but in a real programming language, that extra function call would have a performance cost in, say, a loop.
 
I could go either way, but I think the more normal way to do it would be to minimize using any function calls you don't need to make. In my experience using add-ons like '-1' in find calls is pretty standard, so I think for any non-junior dev they'd be able to pick up on it pretty quickly. Whereas using TRIM is more a consequence of not using the FIND function correctly, rather than a call you actually need to make.

And to me, the reason for your TRIM is just as convoluted as the -1, so either way I'd need to do some deduction. So might as well just minimize your code.

Kind of reminds me of some comments I've heard about the use of 'DISTINCT' in SQL. An old manager of mine used to say that if you need to use DISTINCT your sql should actually be corrected so you don't need to use it at all. I'd think the same type of thing applies here.

That said, more important to readability is consistency. If your entire code-base does it one way or the other, keep doing the same.

Good point. This was in a cell formula, but in a real programming language, that extra function call would have a performance cost in, say, a loop.

In a spreadsheet a formula often appears many times; and it may need to be recalculated on every row for every data change (although excel is usually smart enough to avoid this in simple workbooks), so performance degradation due to additional function calls is a possibility even when only using in-cell formulae.

It's very easy to specify an array in a workbook with very large total cell counts, particularly if there are a fair number of worksheets. Modern computers can generally just handle whatever you chuck at them, but it's not so long since I worked with Excel models that had several seconds of hourglass time for a recalculate (with the attendant risk of people setting calculation to 'manual', to avoid a pause between each cell during data entry, and then reading results without remembering to hit F9).

As an ancient fossil, I grew up with a healthy respect for the risks of inefficient code, and of inefficient workbook and model design. VBAExcel custom functions and macros can be pretty slow to run if you don't take pains to avoid needless steps.
 
As I read the OP and saw the "-1", my immediate thought was "why would you hardcode an offset when you can use a perfectly good Trim function".
What if there are multiple spaces? What if a TAB was used instead of a space?
 
As I read the OP and saw the "-1", my immediate thought was "why would you hardcode an offset when you can use a perfectly good Trim function".
What if there are multiple spaces? What if a TAB was used instead of a space?

The excel spreadsheet function TRIM() only removes spaces, not other whitespace characters like tabs.

Since the FIND() function finds the first space, there will only be one space at the end here.
 
Back
Top Bottom