This was 2021 (so pre-llm), but I used to work for a company that gathered data for training voice commands (Alexa, Toyota, Sonos, were some clients). Basically, we paid people to read digital assistant scripts at scale.
Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.
You are reporting on a deliberately curated effort vs. what I understand is effectively voluntary data donation without incentives. It's not surprising to me that the later dataset ends up biased due to the differences in sourcing.
You aren’t supposed to move the terminal on a residential plan, but there are plans for RVs, boats and planes that allow you to change location and/or use while in motion.
I had the RV plan when they said it would not work in motion, but it worked pretty well on the highway anyway.
The craziest part of this is that a school district thinks that the overnight location of the vehicle used to transport a student has anything to do with the location of the residence. Especially when that data is from a time period when the school isn't in session.
I can think of a half dozen valid scenarios why the vehicle used for school drop off is parked away from the student's residence at night.
e.g. Vehicle belongs to a non-custodial parent from out of district who handles drop off. Vehicle is used by a household member to do overnight shift work. Family just moved, of course their vehicle wasn't being parked in the district in July. ALPR character recognition error. Parent and student live elsewhere in the summer, and still qualify as residents within the district.
It sometimes boggles the mind the amount of inflexibility that people doing these jobs have/are willing to use, especially in something so consequential.
Click through to the article you are commenting on, it’s very clear. It is a link to the official government site for British Columbia, a large province encompassing the entire pacific coast of Canada.
You absolutely can be, especially if you knew, or should have known, that the knife was likely to be used illegally.
While a bit more extreme than your example, there have been multiple cases where the parents of a school shooter have been held responsible because they provided access to a weapon when there were warning signs.
On the less extreme end of the spectrum, this is the same reason why you have to pretend that you are buying a "water pipe for tobacco" and not a bong if you don't want to get kicked out of the headshop (in places where that is still illegal).
You are missing the correlations that Claude can derive across all these user sessions across all users. In Google analytics, when I visit a page and navigate around till I find what I was looking for or didn't find it, that session data is important for website owners how to optimize. Even in Google search results, when I think on 6th link and not the first, it sends a signal how to rearrange the results next time or even personalize. That same paradigm will be applicable here. This is network effects and personalization and ranking coming togther beautifully. Once Anthropic builds that moat, it will be irreplaceable. If not, ask all users to jump from Whatsapp to Telegram or Signal and see how difficult it is. When anthropic gives you the best answer without asking too much, the experience is 100x better.
The underlying technology is a thin layer of queryable knowledge/“memories” in between you and the llm, that in turn gets added to the context of your message to the llm. Likely RAG. It can be as simple as a agents.md that you give it permission to modify as needed. I really don’t think that they are correlating your “memories” with other people’s conversations. There is no way for the LLM to know what is or isn’t appropriate to share between sessions, at the moment. That functionality may exist in the future, but if you just export your preferences, it still works.
The moat - at this point in time - is really not as deep and wide as you are making it out to be. What you are imagining doesn’t exist yet. Indexing prior conversations is trivially easy at this point, you can do it locally using an api client right this moment.
Besides all that, you will be shocked at how quickly a new service can reconstruct your preferences. I started a new YouTube account, and it was basically the same feed within a few days.
In any case, my feeling is that we should have learned at this point not to keep our data in someone else’s walled garden.
> Besides all that, you will be shocked at how quickly a new service can reconstruct your preferences. I started a new YouTube account, and it was basically the same feed within a few days.
Because your location data, wifi name and etc hones in on the fact this is the same person as before. You are actually supporting my point than denying it.
Few students do optional assignments unfortunately. Other tasks that are directly worth a gradetend to take priority (e.g. studying for another class that has an exam this week).
1. Class attendance is frequently optional, but students still attend.
2. I had a prof. that didn't require homework be done. He would give out "practice fun" and would gladly sit down, give feedback and 1:1 time to those who completed it, or tried. He also pointed out that it was rare to pass the exams for students who didn't do "practice fun". Most people did the work.
It leads me to believe - from my own experience too - that students generally aren't stupid, and will gladly do the work if there is a point. Plenty of homework is pure busywork though, even at the college level.
Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.
reply