More

chrisaycock · 2026-03-20T17:51:46 1774029106

The project relies on Rayon [1] for scheduling parallel tasks and Cranelift [2] to JIT the hot loops.

There are plenty of other interesting features like auto-FFI, bytecode caching (similar to Python's .pyc files), and "daemonize" mode (similar to mod_perl or FastCGI).

[1] https://docs.rs/rayon/latest/rayon/

[2] https://cranelift.dev

chrisaycock · 2026-03-14T06:26:55 1773469615

The one piece of advice I give new PhD students is to maintain a list of your references for a bibliography ahead of time. For every paper you read, copy the citation in BibTeX format and write a couple of sentences to remind yourself what the paper was about. Do this for every source, even if it doesn't seem important at the time.

ccppurcell · 2026-03-14T07:21:45 1773472905

Use zotero and betterbibtex. By all means type a comment so you know which ideas came from where but I'm a big advocate of taking notes by hand when you really want to understand something, as opposed to reminding your future self about something you already understand.

ifh-hn · 2026-03-14T07:31:54 1773473514

There's also better notes too for this.

psychoslave · 2026-03-14T08:59:23 1773478763

Not within a PhD, but as a side project I work on a research project on wikiversity about grammatical gender in French. It does reference a bunch of books and academic works, like probably a hundred I guess. The most tedious work though is to check which nouns are used only in a single gender of do have some epicenic or specific inflection used in the wild and giving a reference that attest that when it's not already so consensual that most general public dictionary would already document the fact. For that the research refers to thousand of webpages. I'm glad that most of the time I just need to drop the DOI, ISBN, or page URL and MediaWiki will handle the filing of the most relevant fields. That's not perfect, it generates the output with many different models currently (some don't have an excerpt field), and some required fields might be left blank, url to pdf won't work, and so on. But all in all it make the process of taking note of the reference quick and not going too much in my way. Creating a structured database out of it can certainly be done later.

ifh-hn · 2026-03-14T06:56:02 1773471362

Zotero and AI have this covered now. If there's one thing AI is good at it's summarising crappy formatted papers. Never understood the 2 and 3 column thing. Horrendous way to format something.

setopt · 2026-03-14T07:24:08 1773473048

2-column format has narrower columns, which means that your eyes move more vertically than horizontally while reading it. That is considered conducive to “skimming” long texts if you’re a “speed reader”.

Do you mean that you’re using AI as a search engine for your local bibliography? I haven’t seen any AI plugins for Zotero.

ifh-hn · 2026-03-14T07:30:04 1773473404

I'm severely dyslexic and the columns are a massive hindrance for me, and I also cannot skim read due to this and meares irlen. So my dislike is not universal applicable, just personal experience.

On the zotero front there a bunch of AI plugins. But I've not used them. But yes the premise is that your can speak and ask your library questions. Some are set up differently though. Personally I can fire a paper into an llm and get a good idea of the content immediately and then ask questions about it. It's more interactive and allows me to get a better idea of it prior to reading it.

somethingsome · 2026-03-14T07:43:03 1773474183

LLMs make too many mistakes when summarizing papers in their current state, I would never trust it to summarize a whole paper at the moment.

I only use it on a sentence or paragraph basis, otherwise it misses the point 90% of the time.

I would strongly advise against this use for the moment. The important part of reading a paper is not only to extract general rules, but to build your own internal model. Without it you cannot effectively do research. The main interesting points are often in the subtleties of the details deep in the paper.

Internal tought that come easily to mind when I read :

- 'oh they used that equation, but that could be also be interpreted totally differently, what happens if we change point of view, does it makes sense from this other perspective'

- 'I see they claim to achieve better results than sota, but actually, they compared with other methods that are not solving exactly the same problem, what shortcut or changes did they had to do to obtain a fair comparison, is it a fair comparison, can I trust those numbers? '

- 'oh, the authors didn't realize that they solved this other problem, or did they realize but there was a block somewhere preventing it?'

- 'I like this trick to achieve that result, but at the same time, it will prevent to solve a whole class of other problems, so their method will not work on those cases'

...

Also, notice that a paper IS a summary of multiple months/years of work, and researchers summarize it already to the maximum to stay within the page limit, by summarizing a summary you will always miss many things.

otherme123 · 2026-03-14T12:19:17 1773490757

I have found a lot of pearls casually buried in the paper, that there is no way a summary, either human or AI, would extract. Things like changing a method slightly, or recovering an old method to apply to a current problem, menctioned like it is not important but actually you have a project blocked in a similar spot.

ifh-hn · 2026-03-14T08:02:41 1773475361

Fair points. And likely why I'm not suited to academia too. I've just never really groked the practice. I've obviously only experience from bachelor's and masters but it always seems that you have an idea and the research is finding papers to back it up, and then some that might not. The work you do doesn't really matter as it seems secondary to the nonsense around "the literature".

aragilar · 2026-03-14T08:30:23 1773477023

In what, that does not align at all with my experience in the sciences (where the point is the novelty, not summarise the literature)?

ifh-hn · 2026-03-14T11:04:56 1773486296

I've a bachelors of science (first) in computer science, and currently doing a dissertation for a master's in cyber security, on route for a first but that might change depending on the mark for this dissertation.

My experience with the bachelors was that despite my project being derailed by the bullshit around formatting the document, doing "research" by searching the library for peer reviewed papers that backed up my claims, etc, etc; I got a excellent mark. In short I set out to make something and due to the academic processes failed in making anything, but because I was able to critically reflect on it, I got a good mark. Waste of time, unless you were just are a good mark.

For my masters I know the project doesn't matter, I'm concentrating on the academic nonsense because that's where the marks are.

ModernMech · 2026-03-14T11:30:58 1773487858

The work you were given in your undergraduate and master’s was not research, it was homework. The task was critical reflection, which is repeatable and achievable for students; whereas research is expensive, one off, and generally out of reach for undergrads, and requires intensive oversight by an experienced researcher.

The waste of time would be for a professor to train you up to be a researcher before you’ve proven you are ready, hence the homework assignments.

ifh-hn · 2026-03-14T12:30:12 1773491412

If that's the case then and researching is way above masters level then how is it you get on a PhD? Genuine question. If everything I've done to date is a pale imitation of the real thing how can I make a fair assessment as to whether I want to pursue a PhD?

ModernMech · 2026-03-14T19:28:09 1773516489

You don’t really, and why a lot of people become researchers only to discover they hate it. But that’s true with all things.

I think the way to know if you want to be a researcher is more along the lines of: do you like finding the answers to questions no k e has thought to ask let alone answered? If so then it doesn’t really matter the training you’ve or the amount t of the field you’ve experienced, you can focus on that bit as your guiding force.

aragilar · 2026-03-15T12:52:35 1773579155

No, it's not about whether masters or PhD, it's whether you did something new (the novelty aspect). It sounds to me that you did a coursework masters of some kind, which gave some basic literature analysis projects. This is like the first month of any research project, and is so you understand the context of the project. The actual work is doing the novel thing, and dealing with the repeated failures.

My suggestion is do a summer research project, and see if you enjoy it. If no-one will take you on, reflect on why that is (and to me that's a strong reason not to do it).

pks016 · 2026-03-14T19:06:00 1773515160

There are few. I use zoteroGPT to extract things(e.g. methods, sample size, species etc.) from a bunch of papers/collection. I don't use it for summary.

bobmarleybiceps · 2026-03-14T08:58:10 1773478690

I feel like that's true when the font is insanely small, which I guess was good when people would print entire proceedings. Reading two column super small font on a computer is super annoying though tbh.

setopt · 2026-03-15T17:20:38 1773595238

If you have a hiDPI monitor, then reading even small fonts is IMO less annoying. But no monitor can still beat the DPI of a good printer.

SirHumphrey · 2026-03-14T08:27:35 1773476855

The comments you write in to Zotero are not what paper is about - abstract covers this well enough - it’s about what you found interesting or useful about the paper.

RhysU · 2026-03-14T12:09:12 1773490152

Citeulike [1] was great but is sadly no more.

Add any paper you pick up to your tracking system before you read it. Make that part of your reading ritual.

Save the PDF right away, too. You may later lose access to the journal. Or, CiteULike (where I^Hyou uploaded all those articles) may go away.

[1] https://en.wikipedia.org/wiki/Citeulike

DamonHD · 2026-03-14T09:16:39 1773479799

I have had some fun exhuming my old LaTeX skills and assembling a BibTeX bibliography from which I automatically extract the right entries presented in whichever style is needed for a given paper and for my own (HTML) site. I even publish the collection in Zenodo in case useful to others. I use the 'annote' field for the reminder you suggest.

xtracto · 2026-03-14T15:18:13 1773501493

Ha! You just made me remember how much I used JabRef (open source bibtex reference app) back in 2004 when I did my PhD.

It was the best/worst 4 years of my life. I studied overseas (uk), met my future wife and got a PhD that really wasn't useful for much to me. Fortunately it was under a scholarship.

whiplash451 · 2026-03-14T14:33:10 1773498790

This is very good advice for a few reasons:

1. It reduces the odds of missing a key reference in your papers and accelerates the write-up of the (often mandatory) Related Work section

2. It helps you maintain a mental map of the field as your research progresses

whateverboat · 2026-03-14T08:33:20 1773477200

The lack of good tools to have good research notes with good search is kind of mind-boggling. I have reverted to having a website for myself, a private one that I run on my machine, using mkdocs which comes close to what I would want.

geokon · 2026-03-14T10:33:35 1773484415

why? like what are you doing with this mega list later? Ive never felt the need for something like this in my research but maybe im missing something

gnfargbl · 2026-03-14T10:42:17 1773484937

Presumably the idea is that you put the relevant parts of the list in your thesis. You need to convince your examiner that you understand the background to the original research you did, and a solid reference list (with supporting text in the introductory/background section of your thesis) is part of doing that.

Personally I did the references at the end and didn't feel like I suffered from that decision, but the key references in my particular area were a relatively small and well-known set.

geokon · 2026-03-14T13:10:21 1773493821

Hmm, yeah. I mean you often see huge reference lists which always just makes me feel like the person can't possible be actually well acquainted with the stuff that's being referenced. So who are you really fooling? Seems all very performative, though I guess I understand the motivation

chrisaycock · 2026-03-06T14:39:29 1772807969

I hadn't realized that Kalshi is a YC company:

https://news.ycombinator.com/item?id=33696486

chrisaycock · 2026-03-04T13:10:50 1772629850

> I think it would be the best to start interpreting the query and start compilation in another thread

This technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.

There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.

More info for the interested:

https://ieeexplore.ieee.org/document/10444855

hinkley · 2026-03-04T20:46:56 1772657216

It’s also common for JITs to sprout a tier and shed a tier over time, as the last and first tiers shift in cost/benefit. If the first tier works better you delay the other tiers. If the last tier gets faster (in run time or code optimization) you engage it sooner, or strip the middle tier entirely and hand half that budget to the last tier.

Asm2D · 2026-03-05T07:30:09 1772695809

I write JITs so I know, but I always try to write in a way that even non-JIT people can understand :)

chrisaycock · 2026-03-01T22:36:27 1772404587

I first encountered q/kdb+ at a quant job in 2007. I learned so much from the array semantics about how to concisely represent time-series logic that I can't imagine ever using a scalar language for research.

Fun fact: the aj (asof join) function was my inspiration for pandas.merge_asof. I added the extra parameters (direction, tolerance, allow_exact_matches) because of the limitations I kept hitting in kdb.

https://code.kx.com/q/ref/aj/

https://pandas.pydata.org/docs/reference/api/pandas.merge_as...

zX41ZdbW · 2026-03-01T23:03:17 1772406197

Similarly, this is how it was introduced in ClickHouse in 2019: https://github.com/ClickHouse/ClickHouse/pull/4774

leprechaun1066 · 2026-03-02T00:07:19 1772410039

The aj function at its heart is a bin (https://code.kx.com/q/ref/bin/) search between the two tables, on the requested columns, to find the indices of the right table to zip onto the left table.

  aj[`sym`time;t;q]

becomes

  t,'(`sym`time _q)(`sym`time#q)bin`sym`time#t

The rest of the aj function internals are there to handle edge cases, handling missing columns and options for filling nulls.

A lot of the joins can be distilled to the core operators/functions in a similar manner. For example the plus-join is

  x+0i^y(cols key y)#x

chrisaycock · 2026-03-02T00:58:35 1772413115

Indeed, my very first attempt used numpy.searchsorted:

https://numpy.org/doc/2.2/reference/generated/numpy.searchso...

I couldn't figure-out how Arthur's bin matched on symbol though, so I switched to a linear scan on the right table to record the last-seen index for each "by" element. While it worked, my hash table was messy because I relied on Python to handle a whole tuple as a key, which had some issues during initial testing.

The asof join I wrote for Empirical properly categorizes the keys before they are matched. That approach worked far better.

https://www.empirical-soft.com/tutorial.html#dataframes

sceadu · 2026-03-02T21:58:50 1772488730

dang I had no idea you wrote the asof join for pandas. thank you for that

chrisaycock · 2026-01-07T11:50:25 1767786625

The article points out that tools like TLA+ can prove that a system is correct, but can't demonstrate that a system is performant. The author asks for ways to assess latency et al., which is currently handled by simulation. While this has worked for one-off cases, OP requests more generalized tooling.

It's like the quote attributed to Don Knuth: "Beware of bugs in the above code; I have only proved it correct, not tried it."

pjmlp · 2026-01-07T12:25:43 1767788743

From my point of view, they cannot even prove that, because in most cases there is no validation if the TLA+ model actually maps to the e.g. C code that was written.

I only believe in formal methods where we always have a machine validated way from model to implementation.

pdhborges · 2026-01-07T12:42:24 1767789744

Well Coq has program extraction built in.

Ericson2314 · 2026-01-07T14:58:09 1767797889

Yeah and that's why it's way better than the likes of TLA+.

ted_dunning · 2026-01-07T16:29:20 1767803360

See Dafny

pjmlp · 2026-01-07T18:20:30 1767810030

I know it, :)

jgalt212 · 2026-01-07T12:40:56 1767789656

preach

throw-qqqqq · 2026-01-07T12:09:03 1767787743

There are methods of determining Worst Case Execution Time/WCET. I’ve been involved in real time embedded systems development, where that was a thing.

But one tool (like TLA+) can’t realistically support all formalisms for all types of analyses ¯\_(ツ)_/¯

chrisaycock · 2025-12-09T12:11:25 1765282285

I had been thinking about this idea for a long time, but I doubt I'll be able to get around to it. After speaking with a friend this evening, I decided to just jot it down for anyone interested.

Basically, use copy-and-patch compilation in a vector language to fuse loops and avoid temporaries. It can be employed for a baseline compiler that will use less memory than an interpreter and will have a much lower startup cost than an optimizing compiler.

chrisaycock · on March 8, 2025

Statically typed dataframes are exactly why I created the Empirical programming language:

https://www.empirical-soft.com

It can infer the column names and types from a CSV file at compile time.

Here's an example that misspells the "ask" column as if it were plural:

  let quotes = load("quotes.csv")
  sort quotes by (asks - bid) / bid

The error is caught before the script is run:

  Error: symbol asks was not found

I had to use a lot of computer-science techniques to get this working, like type providers and compile-time function evaluation. I'm really proud of the novelty of it and even won Y Combinator's Startup School grant for it.

Unfortunately, it didn't go anywhere as a project. Turns out that static typing isn't enough of a selling point for people to drop Python. I haven't touched Empirical in four years, but my code and my notes are still publicly available on the website.

noworriesnate · on March 8, 2025

Wow this is amazing!! Thanks for sharing!

I love how you really expanded on the idea of executing code at compile time. You should be proud.

You probably already know this but for people like me to switch "all" it would take would be:

1. A plotting library like ggplot2 or plotnine

2. A machine learning library, like scikit

3. A dashboard framework like streamlit or shiny

4. Support for Empirical in my cloud workspace environment, which is Jupyter based, and where I have to execute all the code, because that's where the data is and has to stay due to security

Just like how Polars is written in Rust and has Python bindings, I wonder if there's a market for 1 and 2 written in Rust and then having bindings to Python, Empirical, R, Julia etc. I feel like 4 is just a matter of time if Empirical becomes popular, but I think 3 would have to be implemented specifically for Empirical.

I think the idea of statically typed dataframes is really useful and you were ahead of your time. Maybe one day the time will be right.

theLiminator · on March 8, 2025

Does this require that the file is available locally or does it do network io at compile time?

chrisaycock · on March 8, 2025

The inferencing logic needs to sample the file, so (1) the file path must be determined at compile time and (2) the file must be available to be read at compile time. If neither condition is true---like the filename is a runtime parameter, for example---then the user must supply the type in advance.

There is no magic here. No language can guess the type of anything without seeing what the thing is.

theLiminator · on March 10, 2025

Yeah, i think that's what limits the utility of such systems. Polars does typechecking at query planning time. So before you really do computation. I don't expect that much can improve over this model due to the aforementioned limitations.

I think needing network access or file access at compile time is a semi-hard blocker for statically typed dataframes.

chrisaycock · on Feb 9, 2025

> Jobs are being eliminated within the IT function which are routine and mundane, such as reporting, clerical administration.

I've had a similar thought recently, that there is no demand for rote programmers. No employer is going to hand you a completed spec and tell you to code it up.

Software engineers and data scientists today must be innovative, understand the business they operate in, communicate with users, and work cross functionally. You’ve got to create something original and see it through without having to be told what to do at every step.

mech422 · on Feb 9, 2025

Personally, I'd classify jobs such as reporting and clerical administration more as 'admin' then IT. You do some of that as a SWE (tickets, SoW, Design docs) but that's not generally the focus of the work.

chrisaycock · on Jan 12, 2025

This op-ed was written by an undergrad and complains that Northeastern's switch to Python (from Racket) for its introductory classes will prevent students from learning fundamentals of computer science.

But that complaint can be made about any language! "This dynamically typed language won't allow students to understand type safety." "This high-level language won't allow students to learn pointers and systems programming." Etc.

I believe that an intro course should get students coding since the first major hurdle is learning how to construct any kind of program at all. The switch to a more "employable" language isn't going to make education worse.

ycombinatrix · on Jan 13, 2025

How about you focus on the argument instead of making ad hominem attacks?

>Racket was chosen because it has “teaching languages” that can gradually introduce features as students are taught the relevant design principles.

So no, that complaint can't be made about any language.

SolarNet · on Jan 12, 2025

Tell me you haven't read the article (or used racket) without telling me.

> I believe that an intro course should get students coding since the first major hurdle is learning how to construct any kind of program at all. The switch to a more "employable" language isn't going to make education worse.

None of this is the issue at hand. The switch to python is because industry uses it. The article correctly makes the point that racket was intentionally designed to get students coding as easily and quickly as possible. It has multiple steps of teaching languages for exactly that purpose, introducing concepts in ways that let students grapple with them one at a time in an interactive environment.

Meanwhile in python complex topics like duck typing, object oriented methods, exceptions, the distinction between iterables and lists, how to use a command line/terminal or how to configure an IDE, and so on must be covered before people can start writing code for the exercises. Racket is streamlined for beginners.

lazyasciiart · on Jan 12, 2025

> Meanwhile in python complex topics like duck typing, object oriented methods, exceptions, the distinction between iterables and lists, how to use a command line/terminal or how to configure an IDE, and so on must be covered before people can start writing code for the exercises.

No, they dont have to be at all. You might as well suggest you need to learn the JVM before writing a line of Java.

wombatpm · on Jan 12, 2025

Python supports imperative, OO and functional programming paradigms. And to start you can use any text editor, an IDE is not required. In fact you can start working in the REPL right away, in which case you need a terminal and the command “Python”.

Jarwain · on Jan 13, 2025

All of that except for working in the terminal, could probably be considered higher level

sorawee · on Jan 13, 2025

To quote the above person: "tell me you haven't read the article without telling me".

You thought that supporting multiple "programming paradigms" is a nice thing, but it's the opposite for teaching beginning student. Experienced programmers want expressivity/customization/choices to do whatever they want. That's not what newbies need when they get stuck on an assignment.

AnimalMuppet · on Jan 12, 2025

Does anyone else question whether an undergrad is best able to evaluate the strengths and weaknesses of the curriculum?

fn-mote · on Jan 12, 2025

In this case, you can find the same criticisms in published articles and books. I expect this student heard them straight from the source (author of the articles or books). That does not lessen their impact or correctness in my opinion.

Also, see SolarNet's comment. https://news.ycombinator.com/item?id=42677918

000ooo000 · on Jan 12, 2025

Yeah.. Undergads leave school and are barely trusted to write CRUD apps.