The Ai thread


Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.
 

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.

In about 10 years, this is going to really do a number on patents. Infinite prior art publications, mostly gibberish, but arguably disclosing future inventions if you squint hard enough.
 
If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?
 
If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?

Well, there’s apparently some disagreement about this, but I was always taught that copyrights belong to humans, and only a human could be an inventor. There have been a couple cases on this in the copyright context, but I don’t think there’s been a patent case exactly on point yet. The USPTO, though, says that inventors have to be natural persons, and cites Thaler v. Vidal, 43 F.4th 1207, 1213 (Fed. Cir. 2022), cert denied, 143 S. Ct. 1783 (2023). Where you use an AI to “help,” the USPTO points to the so-called Pannu Factors (which are factors to be considered when deciding if someone jointly invented something) and says, essentially, a human has to be at least a joint inventor.

Of course, the USPTO doesn’t have the last word on this, so time will tell.
 

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.
A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.

 
A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.


Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.
 
Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.
It could be an arms race - ever more sophisticated bots, ever more powerful detection. Dunno.
 

Oh great.

Judicial scholars, a former U.S. Department of Labor official, and lawyers who represent Nevadans in appeal hearings told Gizmodo they worry the emphasis on speed could undermine any human guardrails Nevada puts in place.


“The time savings they’re looking for only happens if the review is very cursory,” said Morgan Shah, director of community engagement for Nevada Legal Services. “If someone is reviewing something thoroughly and properly, they’re really not saving that much time. At what point are you creating an environment where people are sort of being encouraged to take a shortcut?”

In cases that involve questions of fact, the district court cannot substitute its own judgment for the judgment of the appeal referee,” said Elizabeth Carmona, a senior attorney with Nevada Legal Services, so if a referee makes a decision based on a hallucinated fact, a court may not be able to overturn it.

And only scant references in an otherwise good article to the who those hallucinations will likely be biased against …
 
Last edited:

Side note:

Government can't regulate things iT'S a FrEE MaRKET!*

*unless you're rich, then the market is based on personal connections as much as money which is not typically what is meant by free/open markets. It's always fun seeing how the supposedly "natural order" of markets actually behaves in practice.
 
On the environment and AI:


"Despite installing nearly 20 gas turbines with a combined capacity of about 100 MW - enough electricity to power around 50,000 homes - xAI apparently has not applied for any air permits for these turbines," the letter, dated Aug. 26, said.

It said the gas turbines emit large quantities of gases that exacerbate already poor air quality in Memphis.

Meanwhile I read somewhere else it isn't even enough to power all the GPUs he actually installed ... can't find the source anymore.


Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

And the big datacenter players (including Apple) are all using legal, but misleading statistics on how much energy they already use:


Emissions from in-house data centers of Google, Microsoft, Meta and Apple may be 7.62 times higher than official tally

To their credit, Google and Microsoft are advocating for more accurate measurements while Meta and Amazon oppose them. Apple wasn't mentioned as being in either camp but uses the misleading measurements to downplay its carbon footprint in official materials.
 
Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.
 
Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.
Yeah not the first typo/mistake I've seen Tom's make, but thanks for going through all that and getting the more accurate figures! I pointed out another mistake of theirs, from Anton no less, where he was marveling over the A18 CPU scores and wondering how much better that A18 Pro might do ... with its 2 extra CPU cores. 🤦‍♂️ Someone else complained as well, a lot more vitriol than myself and well as I said to him, before Anandtech's passing (common owner with Tom's) Ryan admitted that copy editors had long been fired from most tech news sites - and were increasingly uncommon everywhere as an "unnecessary" expense. And proofreading before publication in general was discouraged as time wasting. So mistakes just don't get caught. BTW the article in question was never edited after the mistake was pointed out, nor on an another article where I pointed out an error. So I probably won't be wasting my time doing that anymore.
 
Last edited:
Back
Top