Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Had the same thing happen to me. Simple problem: find out if two strings are anagrams of each other.

My immediate solution: Sort the two strings and then compare them:

  (defn anagrams? [x y] (= (sort x) (sort y))) 
or some such. I was fortunate in that they didn't make me implement the sort (because it's been a long time for me) :)

"Ok, what's the efficiency of that solution?"

"Well, assuming the library sort functions I'm using are sane, I'll take a guess and say O(n log n)"

"Can you come up with a more efficient solution."

Off the top of my head, on the spot, I couldn't. Later, during the plane ride home, the obvious occurred to me: Don't sort the strings, just scan through each string once and build a hash table, keeping track of the number of occurrences of each letter in each string. Then compare the occurrences for each string. Not as elegant to express in code, but faster.

As it turns out, they later declined to hire me, giving me an excuse that made it clear they weren't really serious about hiring anyone for the position (as is the case at least half the time, it seems).

And thus ended my latest round of failed interviews. I cancelled the last one I had scheduled (probably another fake) and decided to take a break for a while (I mean, I do have a job right now, so it isn't urgent).



You need not to sort the strings. Create a vector with indices he ascii codes, incrementing for the first string and decrementing the count for the second, and keep a count of the number of chars, if you get a negative number exit false, else if the count of chars is zero then each one is an anagram of the other. (n+m+128) operations n and m are the length of both strings and 128 for creating the vector


Interviewer: "OK, how about with a unicode string?"


That sounds a lot like doing a radix sort of the strings, then comparing the run length encoded, sorted strings ;p


on a tangent, one way could be assign a number code to each alphabet. Add the numbers that occur in the strings. IF the sum matches, they are anagrams.


I don't see how the sum would be unique to a particular combination of letters.


Works if you assign prime numbers to each letter and multiply instead. So a=2, b=3, etc.


To get decent anagram lengths and complexity, implement the numbers as a dict of repetitions of primes, and implement the multiplication by summing the repetitions. ;-)


In which case you can just compare the dicts without performing the multiplication (which happens to be the costliest part for arbitrary-precision integers).


Exactly.


I briefly mused about just summing the ASCII codes for each letter in the strings. But quickly discarded the idea for this reason :)


“ad” = “bc”

You’re right.


What if a=1, b=10, c=100 - etc? Assuming the strings were English words...


You would have to make sure the sum of any combination of all characters was unique. For example, if the code was the character number, a=1, b=2, etc, both "abc" and "bbb" would have the same sum.

So I think something silly like: character_code = len(string)*len(alphabet)^character_index should work.


You need to multiply the numbers, (and they must be prime) not add them.


a=1, b=2, c=3, ...

ac = 1 + 3 = 4 bb = 2 + 2 = 4


I guess the numbers should be primes... 4+1=3+2


10 = 5 + 5 = 7 + 3


that's right... even prime isn't enough


Prime is ok if it’s multiplied (there is one and only one prime factorization of a number).... but that can get to absurdly large numbers.

Still, consider the word ‘abe’ with a = 2, b = 3, c = 5, d = 7 and e = 11.

abe would then be 2^1 * 3^1 * 5^0 * 7^0 * 11^1. ‘abba’ would be 2^2 * 3^2. Each anagram would have a distinct value.

See also Gödel numbering and FRACTRAN.


That’s right... stupid me, it’s the base of public key cryptography (doh)


def anagram?(a,b) a.reverse == b end




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: