The Ai thread


Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.
 

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.

In about 10 years, this is going to really do a number on patents. Infinite prior art publications, mostly gibberish, but arguably disclosing future inventions if you squint hard enough.
 
If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?
 
If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?

Well, there’s apparently some disagreement about this, but I was always taught that copyrights belong to humans, and only a human could be an inventor. There have been a couple cases on this in the copyright context, but I don’t think there’s been a patent case exactly on point yet. The USPTO, though, says that inventors have to be natural persons, and cites Thaler v. Vidal, 43 F.4th 1207, 1213 (Fed. Cir. 2022), cert denied, 143 S. Ct. 1783 (2023). Where you use an AI to “help,” the USPTO points to the so-called Pannu Factors (which are factors to be considered when deciding if someone jointly invented something) and says, essentially, a human has to be at least a joint inventor.

Of course, the USPTO doesn’t have the last word on this, so time will tell.
 

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.
A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.

 
A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.


Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.
 
Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.
It could be an arms race - ever more sophisticated bots, ever more powerful detection. Dunno.
 

Oh great.

Judicial scholars, a former U.S. Department of Labor official, and lawyers who represent Nevadans in appeal hearings told Gizmodo they worry the emphasis on speed could undermine any human guardrails Nevada puts in place.


“The time savings they’re looking for only happens if the review is very cursory,” said Morgan Shah, director of community engagement for Nevada Legal Services. “If someone is reviewing something thoroughly and properly, they’re really not saving that much time. At what point are you creating an environment where people are sort of being encouraged to take a shortcut?”

In cases that involve questions of fact, the district court cannot substitute its own judgment for the judgment of the appeal referee,” said Elizabeth Carmona, a senior attorney with Nevada Legal Services, so if a referee makes a decision based on a hallucinated fact, a court may not be able to overturn it.

And only scant references in an otherwise good article to the who those hallucinations will likely be biased against …
 
Last edited:

Side note:

Government can't regulate things iT'S a FrEE MaRKET!*

*unless you're rich, then the market is based on personal connections as much as money which is not typically what is meant by free/open markets. It's always fun seeing how the supposedly "natural order" of markets actually behaves in practice.
 
On the environment and AI:


"Despite installing nearly 20 gas turbines with a combined capacity of about 100 MW - enough electricity to power around 50,000 homes - xAI apparently has not applied for any air permits for these turbines," the letter, dated Aug. 26, said.

It said the gas turbines emit large quantities of gases that exacerbate already poor air quality in Memphis.

Meanwhile I read somewhere else it isn't even enough to power all the GPUs he actually installed ... can't find the source anymore.


Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

And the big datacenter players (including Apple) are all using legal, but misleading statistics on how much energy they already use:


Emissions from in-house data centers of Google, Microsoft, Meta and Apple may be 7.62 times higher than official tally

To their credit, Google and Microsoft are advocating for more accurate measurements while Meta and Amazon oppose them. Apple wasn't mentioned as being in either camp but uses the misleading measurements to downplay its carbon footprint in official materials.
 
Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.
 
Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.
Yeah not the first typo/mistake I've seen Tom's make, but thanks for going through all that and getting the more accurate figures! I pointed out another mistake of theirs, from Anton no less, where he was marveling over the A18 CPU scores and wondering how much better that A18 Pro might do ... with its 2 extra CPU cores. 🤦‍♂️ Someone else complained as well, a lot more vitriol than myself and well as I said to him, before Anandtech's passing (common owner with Tom's) Ryan admitted that copy editors had long been fired from most tech news sites - and were increasingly uncommon everywhere as an "unnecessary" expense. And proofreading before publication in general was discouraged as time wasting. So mistakes just don't get caught. BTW the article in question was never edited after the mistake was pointed out, nor on an another article where I pointed out an error. So I probably won't be wasting my time doing that anymore.
 
Last edited:
MS is dedicated to developing elaborate large model bases, but also, to do it in, uh, environmetally responsible ways. To that end, they are pursuing an energy sourcing agreement that could result in the reopening of a nuclear power plant on the Susquehanna River near Harrisburg PA. You may have heard of that particular plant, but, eh, it has been nearly half a century since it had a catastrophic systems failure, so it is probably all better now.
 
Meta hasn't been this groundbreaking since they announced the ability to search Facebook in 2013.

IMG_5587.jpg
 
Meta hasn't been this groundbreaking

My question came about by Meta (FB), but could apply to all AI.

Due to drought conditions, WV has been very dry this summer and as a result, the yellow jacket population has increased quite a bit. The old trail system I rode for a decade or more has been full of complaints about people being stung.

The problem is all the FB pages dedicated to these trails have been inundated with complaints about "bees". They aren't bees, but yellow jackets.

At the top of many of these posts is the Meta AI summary of what is being discussed, and as a result, it thinks bees are all over the trails. Because that is what these idiots are posting.

Will the AI use the info in these posts as part of its data set going forward, or will it disregard/forget what it previously summarized?

How does it determine if the data is valid or not. In this case, it is not valid because they aren't bees.
 



And this is why government regulations (with teeth) are needed - and not just in this industry. The consequences for committing business crimes should not be so easily factored in as a mere cost of doing business whether they be huge companies or startups backed by them (effectively).

Edit: Oh and there was these gems too just in case he didn’t make it clear what incredible asshole he is:





Edit: original link disappeared

This guy really is an asshole:

The host then followed up with, “Do you think we can meet AI’s energy without total blowing out climate goals?” and Schmidt answered with, “We’re not going to hit the climate goals anyway because we’re not organized to do it — and the way to do it is with the ways that we’re talking about now — and yes, the needs in this area will be a problem. But I’d rather bet on AI solving the problem than constraining it and having the problem if you see my plan.”
 

A good summary of an Apple study on the inability of AI to do formal reasoning. Now this can be accused of being a study on “is water wet?” But the consequences of it do run deeper as the fragility of the current models and methods means they are incapable of being reliable agents. Also, it is important that it is Apple-backed engineers, academics and smaller companies might be more easily ignored. Apple however is not so easily ignored. It may be coincidence, but maybe not, that Apple also recently pulled out of talks to invest in OpenAI.

Gary Marcus does suggest that neurosymbolic learning (similar to what Google did for an AI to learn geometric proofs) as a way forwards.

This follows on the heels of several studies showing that even for programming AI models have far more limited use than advertised - not none by any stretch of the imagination, as a tool to aid in writing code and documentation it can still be useful, but there are downsides especially to relying on it just to output correct code from a prompt. And doing the latter does not necessarily lead to increased productivity, especially compared to experienced programmers and especially when bugs are taken into account.
 
the fragility of the current models and methods means they are incapable of being reliable agents

And I will say that the engineers on the ground aren't unaware of this, and have been butting up against this longer than you might think. Doesn't stop the tech bros from pushing the tech though.

This follows on the heels of several studies showing that even for programming AI models have far more limited use than advertised - not none by any stretch of the imagination, as a tool to aid in writing code and documentation it can still be useful, but there are downsides especially to relying on it just to output correct code from a prompt. And doing the latter does not necessarily lead to increased productivity, especially compared to experienced programmers and especially when bugs are taken into account.

Nor does it help with learning. Much like how writing things out by hand for notes seems to help retention better than typing the notes. It engages more/different parts of the brain that helps with learning. Having an AI do it for you has much the same problem, but arguably worse.

I am working in a space I'm unfamiliar with at the moment (watched my team get shuffled from owning mobile apps to owning web code), and I actually had jokes made: why wasn't I using AI to get a task done faster?

Because I'd like to actually retain the knowledge of how stuff works so I can be faster, and remain experienced. I didn't get to where I was by not learning on the job.
 
Back
Top