In the /r/macapps subreddit, they have huge influx of new apps posts, and the "whisper dictation" is one of the most saturated category. [0]
>“Compare” - This is the most important part. Apps in the most saturated categories (whisper dictation, clipboard managers, wallpaper apps, etc.) must clearly explain their differentiation from existing solutions.
During my limited testing, it works better than I expected at handling multiple languages in a single session. Perhaps I just had a low expectation since I've mostly worked with English-only STT models.
Nothing unique, it's just taking a snapshot when it's processing the input. Even processing a single image will increase the TTFT by ~0.5s on my machine, so for now, it seems to be impossible for feeding a live video and expecting a real-time response.
In regards to the video capability, I haven't tested it myself, but here's a benchmark/comparison from Google [0]
I totally get these are very hard problems so solve and that we're on the bleeding edge of what's possible but I can't help and wonder when someone is going to crack real video understanding.
sure, maybe it's still frame-by-frame but so fast and so often that the model retains a rolling context of what's going on and can answer cleanly temporal questions.
"how packages were delivered over the last hour", etc.
Huh that's weird. I just tried it and it works on my machine. Could you perhaps create a GitHub issue and share the reproduction steps and any relevant logs?
Don't have the time right now but will play around with it next weekend for sure and will give you more feedback with logs when I see that I can reproduce it.
For now what I did was:
- Tested in Chrome/Safari/Firefox on Tahoe.
- Followed the quick start install instructions from github repo
- Everything worked
- Closed terminal
- Disconnected internet (Wifi off)
- Opened terminal
- Started server again (uv run server.py)
- Opened localhost in browser, it asked for camera/mic normally, granted access, saw camera live feed but "loading..." at bottom center of the site and AI did not listen/respond
- Reproduced this about 3 times with switching between wifi on/off before starting the server, always the same (working with internet; not working without)
- Figured it also works fine if I start the server with internet connected and disconnect it afterwards
I read a UI book in the early 2000s that cited research showing that most users didn't understand filesystems. They would seem to, but then the idea that the same filename in two places was two unrelated files would just lead to a mental block. Those who got it, didn't find it hard. It's just that some people can't get it.
The disconnect is not between some developers, and the younger folks. It is between some developers, and most of the world.
I think a lot more people than most HN readers realize simply struggle significantly with abstract thinking and reasoning.
It's natural that people who enjoy programming and hacking and related fields are very comfortable with such abstract types of thought. But I really think that isn't all that common amongst most people. I think the average person has to learn such thinking abilities with difficulty (though they can). I'm sure many people here got into programming precisely because abstract thinking came easily to them.
> the idea that the same filename in two places was two unrelated files would just lead to a mental block.
Which is actually why the "files and folders" metaphor is apt. In a filing cabinet in a school office (once upon a time) there were likely hundreds of documents labeled "Report Card" in many different folders, each labeled with a different name.
> I'm sure many people here got into programming precisely because abstract thinking came easily to them.
Counter here: When I wanted to switch from TurboPascal during school (14y/15y) to C++ (because it was "more cool" and that was the tool that the 'big boy' game-dev-pros were, we thought), it was so damn hard for me - really! I was struggling so massivly, I head massive problems with this pointer stuff - it took me years to fully understand it.
And I was hell-bad at math in school (or maybe just too lazy), the only thing to which I a relation was all this geometric stuff (because this was needed for .. game dev! :-D )
Pointers are famously difficult to learn and reason about even though the basic principles are simple. Programming in a style that requires direct manipulation of pointers when it's not actually necessary is usually regarded as unwise because it's so hard to get right.
OP had no problem with pointers prior to trying C++. I think there is a case to be made that C(++) makes pointers unnecessarily confusing and there is no real disconnect between understanding pointers in theory and in practice otherwise
Pointers aren't hard, it's C/C++ that make them complicated. Addresses and indirection in any assembly language are simple and straightforward, easy and even intuitive once you start actually writing programs.
I used to think I was incapable of learning "real" programming because I didn't get C. When I later read a book on programming in assembly, I realized that everything that had felt so complex was actually not so difficult. C pointer syntax is weird and doesn't parse naturally for many people, especially programming novices who might not yet have a solid grasp on what/how/why they're doing anything.
> Which is actually why the "files and folders" metaphor is apt.
It's a starting point, but I certainly wouldn't say it's the best metaphor that there could be. The idea of subfolders just doesn't make sense in a filing cabinet analogy, because you have to consider paper size - any folder which could fit into another folder is not going to be able to contain your regularly sized documents.
People understand hierarchy. That named file is in a folder in a particular drawer of a particular cabinet in a particular room of a particular building in a particular neighborhood in a...
What some people struggle with is recursive hierarchy where each step doesn't change the kind of container. I guess they never saw a Matryoshka doll when they were little.
> The idea of subfolders just doesn't make sense in a filing cabinet analogy,
Sure it does. The document is located in Building C, Sub-basement 2, Room 123, cabinet 415, folder labeled "Accounts". And a physical folder can certainly contain other folders. Nit-picking the analogy wastes everyone's time.
I can't blame them. We've been force-upgraded to Windows 11 at work and that OS and its apps do their upmost to obscure where files are located.
I've frequently saved on OneDrive instead of locally, by accident, and then been perplexed when I try to reopen the file later.
And I've been using filesystems for 35+ years, so I feel sympathy for those who don't understand the abstraction. At this point Android is more transparent about its files.
> We've been force-upgraded to Windows 11 at work and that OS and its apps do their upmost to obscure where files are located.
That's because there's research that users don't understand filesystems. So then stupid companies who make bad decisions like Microsoft and Apple decide that that means they should pretend filesystems don't exist.
By that logic, operating system developers struggle to understand that putting two files with the same name into the same folder(1) is very much possible in the physical world.
(1) or referencing them from the same directory, which was the earlier metaphor.
I've seen two people with the same name and birthday, in different departments of the same building. Caused regular problems with management and HR.
I've also seen two different customers with the same name and phone number - the number got recycled and went to second one while the first hadn't updated their number on file. We had to tell them apart by address.
But why are filenames equated with spacetime coordinates? That doesn't make any sense - reflect on why you leaped to that analogy. The spacetime coordinates are the disk ID and sector number. We've been using operating systems that work a certain way for so long that we think filenames are like spacetime coordinates.
In the time it took you to write this comment, you've thought more about the abstraction than most of the people who are confused by it -- and it will never succeed to coax them out of their confusion with such logic. :)
I think that's perfectly understandable. File systems require the user to remember a hierarchy in their head (even if there are tools like breadcrumbs to help you out), and many people aren't willing or aren't able to hold an arbitrarily complex structure like that in their head. A name is a flat piece of information, no extra structure to imagine.
I worked with a professor one time that used floppies for all his files (after they had been surpassed by thumbdrives) because each floppy was essentially a single folder, and he could wrap his head around that conceptually.
> two unrelated files would just lead to a mental block
Because in the analog world, each "document has usually a single/unique headline" and file names are often perceived as some type of unique identifier as well, Id guess?
> It is between some developers, and most of the world.
not even the older generations. My parents save files on the Whatsapp chat, and my father is one who bought the first IBM PC when it came out, so someone who has touched these things for decades (tho very superficially).
I think that the software industry, especially operating systems, have completely failed to provide a balanced product between the overly bloated and messed up (Windows), the overly complicated (Linux) and the overly simplified (Android/iOS).
Maybe some Linux distros are now at the right spot, I was positively surprised by PopOS to give an example, but it's too late. With AI this is only going to get worse.
That's becoming dangerously true of my wife and I as well, to be honest.
The friction is just so much lower than Google Drive or whatever. As long as I handle it right away. It's just finding something from more than an hour ago that's intolerable.
Also, you usually have context for the file. Like "Hey, can you send me this blueberry crumble recipe?".
I do this quite frequently. I know which person knows, I know I've asked them before and usually a quick keyword search is enough to find whatever I'm looking for again.
So this thing has at least two more information points I can search for to pinpoint the file than a simple file on my PC. It tells me who, and more context on what.
I met a business partner who is doing some work for SME retail investors last week for lunch:
He showed me his WhatsApp: People are sending _ALL_ type of critical documents by WhatsApp to him. Everything.
(and bank statements are among the class of "less critical" documents in his case)
My theory here is: "If you have any function in your product, people will use it for anything appropriate to them in a given minute"
To be fair, what other simple way is there to send a document to a contact through an e2ee channel? Mail + PGP/GPG? Wormhole?? openssl???
Sending it via WhatsApp (which also has desktop clients, btw) strikes me as a perfectly reasonable solution. (Which is somewhat of an indictment of the current state of cryptographic software, but that's a different topic.)
This exact scenario happened with me in a prior job. Invoices, payments, everything could (and sometimes was) sent through WhatsApp. It was absolutely shocking to see people do this.
I witnessed a cop attempting to manipulate some files I provided to him on a thumb drive. It was a slow laborious process of dragging files one at a time from the Windows image viewer to shared folder. I would have liked to just do a Ctrl-A, Ctrl-C, Ctrl-V, but that was way above his level of thinking and he didn't seem like the type who wanted an education. So I just sat there through the long, painful process--and then at the end he completely screwed up the report. Idiot.
(17 yo here), I think that I am eternally grateful to my cousins who convinced my parents to give me a desktop computer which is still working right now (it had a minor hiccup in the processor recently but it works), before that, I was having a 1 gb crt monitor win7 on which I somehow ran Vscode smoothly.
I am very frugal (to save money on webcam, in online classes, I had droidcam /wo-mic setup with one of my parents old phones that were so old that online classes couldn't work or were just too slow) but spending money on a decent personal computer is genuinely one of the best investments personally.
One thing my cousins did which I am sorta grateful in retrospect is they didn't buy me a gpu so my computer was really nice/smooth in everything but gaming, I still ran some games like portal series , inscryption and many other games like valorant and it was playing valorant when I started realizing its chinese company roots and kernel level access meaning that there was no proper way to guarantee to have piece of mind unless I reinstall it
So I felt like if I was reinstalling, I was watching some the linux experiments video anyway and was fascinated by linux, so I just decided to choose myself to use nobara-linux for the first time which was another one of the best decisions that I made as it opened me up to the terminal.
> grateful in retrospect is they didn't buy me a gpu
Great sentence! I will apply this to my kids as well, I guess.
I always tell them already: "In the future, you can game as much as you want, IF you learn a good programming language [which will be defined by me]" - let me see how this will work out in 1-2 years :-D
The first thing that my brothers did when I had the computer was firstly change the wallpaper to a good mountain wallpaper, installed vscode and asked me to program a python program to reverse print in python so print 10 9 8 7.. 1 each in new line (iirc) [I was in 8th grade]
then they asked me to square while reverse printing or something too. so printing 100 81 64 .. 1 each in new line.
> let me see how this will work out in 1-2 years :-D
Keep me updated haha! To be honest, I will admit though that I am not the greatest within coding itself right now as much as I love tinkering with open source. Personally I am wishing to learn coding with better interest when I get into college, I will have 4 years to learn peacefully (well hopefully if I get into decent college ie) :D
For me the challenge after using Linux was that I wanted to use archlinux because my brother (not cousin, real), flexed me his iirc distrotube archlinux once when we were eating something and I thus always considered arch to be the final boss of Linux lol and so I decided to install it and then I fell in love with arch (currently on cachy on desktop, but right now on mac which my brother gifted me :D)
On my birthday iirc once long time ago I think in 5-6th not sure, my brother gave me his laptop, I wanted to do python but python wanted admin password on windows to install properly. So what I did was I dont even remember how, but download one operating system which could then crack the windows password so that I can set new and I used that to then set a new password to then install python. to then only print hello world :D (I think only because one of the cousins I really admire mentioned that he made 2k loc of python once and I thought during that time, python is the endgame). We are talking about windows 7 but I think that windows 10 security must've gotten better. So these are some things that I have done, I wouldn't call it coding as much as tinkering but I love doing these things from as long as I can remember :D
I think this all started because I tried pirating pokemon-yellow so that I can play it. My brother just said to me google it, or told me the word rom and asked me to figure it out and I was in 2nd or 3rd grade maybe 4th grade lol and I pirated it (Hope nintendo doesn't sue me now xD)
Sorry for making this long but your comment somehow made me remember somethings that I had forgot/weren't touched in a long time xD! I think the main takeaway is that I just treated all of these as challenges I guess, like I wanted to prove myself that I can do that or if a thing is possible/not. I haven't done too much coding myself so I just say that I am tinkerer :D
I hope that this can be helpful to you to teach your kids what you mention. I mean make it a challenge where if they fail, they don't feel pressure but they also feel competitive just enough to try their best as much as they can :D and I think in some sense personally I just wanted some respect/to impress my elder cousins/brothers as they were really elder/mature than me. It's also not been all good though if you are too young than most of your cousins.
The thing is, I don't have any measurable advice, a lot of what I have done till now is just unquantified. Coding on the other hand is quantifiable in some sense (it works or it doesn't). I just do things because I wanted to, and I think I still do that same way. Sometimes I wish if the things that I want are something measurable but my mind doesn't work that way.
The thing is, which depresses me sometimes, is that I am just a number at the end of the day to many if not all whether including in future job/business etc., nobody to whom I interview when I wish to get a job from sometime from now is going to read a lot of this and with AI and some genuine problems in the industry like too many people, this problem gets even larger, sigh. So in that sense I just want to be happy sometimes.
Sorry for the long comment once again and the depressing end, but I recommend watching some cat videos though and I wish you and your kids to have a nice day! :D Say hi to them from my side!!
No. There is a disconnect between domain insiders and those that are not. This is not specific to any one domain. It's also not about age.
Some insiders know about this disconnect and fewer still can bridge it easily.
Those that cannot even sense this disconnect, they're a bit of a pain in certain situations. You know, like talking to project stakeholders or customers.
This is my stance as well, but keep in mind that a lot of people have the opposite preference.
They didn't grow up with the world wide web. They only started using technology when Android and iPhone was popular. They only know Whatsapp, Youtube, TikTok. They're not used to using the browser.
There's a meme that "Gen Z Kids Don't Understand How File Systems Work" [0]
There's a reason the "small web" is having a revival among these kids, because they increasingly haven't experienced a real web to begin with. Circa ~2010, the web effectively died in the mainstream since Google decided it wasn't worth showing. Platforms become a thing, and despite being web-based, are practically their own intranets that use the web as a cross platform zero install delivery platform
When you say "meme", it sounds like it might not be true. But, a few years ago I handed my stepson a USB flash drive with some files on it, he plugged it into his laptop and the very first thing he did was launch Google Chrome and then not have any clue what to do to access the files (it was a Windows laptop).
Thank you. This reminds me of a paragraph from the LatentSpace newsletter [0]
> The excellent on device capabilities makes one wonder if these are the basis for the models that will be deployed in New Siri under the deal with Apple….
This app is cool and it showcases some use cases, but it still undersells what the E2B model can do.
I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1]
I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro.
Thanks for sharing! I'm still torn about it. Sure it'll feel more natural if you have the AI head animation, but I don't want people to get attached to it. I don't want to make the loneliness epidemic even worse.
Thanks! Although, I can't claim any credit for it. I just spent a day gluing what other people have built. Huge props to the Gemma team for building an amazing model and also an inference engine that's focused for edge devices [0]
reply