The Ai thread

dada_dave · Sep 8, 2024

GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation | HKS Misinformation Review

Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar...

misinforeview.hks.harvard.edu

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.

Cmaier · Sep 8, 2024

dada_dave said:
GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation | HKS Misinformation Review

Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar...

misinforeview.hks.harvard.edu

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.

In about 10 years, this is going to really do a number on patents. Infinite prior art publications, mostly gibberish, but arguably disclosing future inventions if you squint hard enough.

Yoused · Sep 8, 2024

If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?

Cmaier · Sep 8, 2024

Yoused said:
If you go to BigLLM and say to it, "I want thing that does stuff" and it designs thing for you, and the design does not require modification in order to do the stuff, who owns the IP?

Well, there’s apparently some disagreement about this, but I was always taught that copyrights belong to humans, and only a human could be an inventor. There have been a couple cases on this in the copyright context, but I don’t think there’s been a patent case exactly on point yet. The USPTO, though, says that inventors have to be natural persons, and cites Thaler v. Vidal, 43 F.4th 1207, 1213 (Fed. Cir. 2022), cert denied, 143 S. Ct. 1783 (2023). Where you use an AI to “help,” the USPTO points to the so-called Pannu Factors (which are factors to be considered when deciding if someone jointly invented something) and says, essentially, a human has to be at least a joint inventor.

Of course, the USPTO doesn’t have the last word on this, so time will tell.

dada_dave · Sep 9, 2024

dada_dave said:
GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation | HKS Misinformation Review

Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar...

misinforeview.hks.harvard.edu

Academic search engines like Google Scholar, especially Google Scholar given the way it’s setup, is becoming choked with papers either partially or wholly generated by GPTs. While the authors acknowledge legitimate uses of such tech like non-English speakers improving their writing, this seems to be more than that and such papers are especially concentrated in fields that affect public policy. This is a huge risk not just to the scientific community’s communications getting overwhelmed but to the public and policy makers. Not to mention the possibility mentioned in an earlier post of mine that the greater percentage of material generated by AI that gets included in future training sets poisons that data. Seemingly scientific papers written by AI, especially those that actually manage to get published either in fly by night journals or even reputable ones, will likely pass human curators given that we’ve seen Onion articles and the like serve as the basis for training LLMs.

A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.

Altaic · Sep 9, 2024

dada_dave said:
A related video on how even current paper mills are used to scam the citation indices used in Academia even before the influx of LLM-generated ones.

Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.

dada_dave · Sep 9, 2024

Altaic said:
Yeah, that’s a truly disappointing situation. You mentioned data poisoning earlier, though it’s really been going on forever. A silver lining is that the majority of the work in ML is data curation, and many of those tools and techniques will help sift the bots and grifters out.

It could be an arms race - ever more sophisticated bots, ever more powerful detection. Dunno.

Altaic · Sep 9, 2024

dada_dave said:
It could be an arms race - ever more sophisticated bots, ever more powerful detection. Dunno.

For sure.

dada_dave · Sep 11, 2024

Google's AI Will Help Decide Whether Unemployed Workers Get Benefits

The state is working with Google on a first-of-its-kind generative AI system that will analyze transcripts from appeals hearings and issue a recommended decision in an effort to clear a stubborn backlog of claims.

gizmodo.com

Oh great.

Judicial scholars, a former U.S. Department of Labor official, and lawyers who represent Nevadans in appeal hearings told Gizmodo they worry the emphasis on speed could undermine any human guardrails Nevada puts in place.

“The time savings they’re looking for only happens if the review is very cursory,” said Morgan Shah, director of community engagement for Nevada Legal Services. “If someone is reviewing something thoroughly and properly, they’re really not saving that much time. At what point are you creating an environment where people are sort of being encouraged to take a shortcut?”

In cases that involve questions of fact, the district court cannot substitute its own judgment for the judgment of the appeal referee,” said Elizabeth Carmona, a senior attorney with Nevada Legal Services, so if a referee makes a decision based on a hallucinated fact, a court may not be able to overturn it.

And only scant references in an otherwise good article to the who those hallucinations will likely be biased against …

dada_dave · Sep 16, 2024

Elon Musk and Larry Ellison begged Nvidia CEO Jensen Huang for AI GPUs at dinner

Shut up and take my money.

www.tomshardware.com

Side note:

Government can't regulate things iT'S a FrEE MaRKET!*

*unless you're rich, then the market is based on personal connections as much as money which is not typically what is meant by free/open markets. It's always fun seeing how the supposedly "natural order" of markets actually behaves in practice.

Yoused · Sep 16, 2024

Larry Ellison?

Omnipresent AI cameras will ensure good behavior, says Larry Ellison

"We’re going to have supervision," says billionaire Oracle co-founder Ellison.

arstechnica.com

Larry F'ing Ellison?

FTN

dada_dave · Sep 16, 2024

On the environment and AI:

https://www.reuters.com/business/environment/musks-xai-operating-gas-turbines-without-permits-data-center-environmental-group-2024-08-28/

"Despite installing nearly 20 gas turbines with a combined capacity of about 100 MW - enough electricity to power around 50,000 homes - xAI apparently has not applied for any air permits for these turbines," the letter, dated Aug. 26, said.

It said the gas turbines emit large quantities of gases that exacerbate already poor air quality in Memphis.

Meanwhile I read somewhere else it isn't even enough to power all the GPUs he actually installed ... can't find the source anymore.

Power Consumption of AI Workloads Approaches That of Small Country: Report

And will keep growing.

www.tomshardware.com

Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

And the big datacenter players (including Apple) are all using legal, but misleading statistics on how much energy they already use:

Data center emissions likely 662% higher than big tech claims. Can it keep up the ruse?

Emissions from in-house data centers of Google, Microsoft, Meta and Apple may be 7.62 times higher than official figures

www.theguardian.com

Emissions from in-house data centers of Google, Microsoft, Meta and Apple may be 7.62 times higher than official tally

To their credit, Google and Microsoft are advocating for more accurate measurements while Meta and Amazon oppose them. Apple wasn't mentioned as being in either camp but uses the misleading measurements to downplay its carbon footprint in official materials.

Nycturne · Sep 16, 2024

Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.

dada_dave · Sep 16, 2024

Nycturne said:
Oh boy was checking out the sources on this tidbit a rabbit hole. Let's ignore their use of the wrong units at first (Tom's? Really?), but they link to Wikipedia, which claims: 4,700 GWh per year. Not 4.7 GWh. If I go to the source Wikipedia cites, it's 4.9 TWh for 2021 (4.9 billion KWh). So first we have a poor citation of Wikipedia. Whoops.

Their link to Schneider is also dead. Instead, what appears to be the latest version of the white paper can be found here. Sadly, it also uses GW for the calculations, so it's not clear how much energy is actually used over the course of a year without doing math. So let's assume this is an average. Over the course of a year, 4.5GW averages out to be 39420 GWh, or 39.4 TWh. Data centers in 2022 supposedly pulled something like 460 TWh, so Schneider's 57 GW for all datacenter usage gives us about 500 TWh in 2023, which is at least in the ballpark. So I think we can take the 4.5GW as an average and call the consumption "just short of 40 TWh".

So the Cyprus comparison under-sells the energy consumption a bit. If AI was a country, it would be in the top 60 countries ranked by energy consumption. In the company of Belarus, Denmark, and New Zealand. Cyprus is #129 in terms of consumption according to Wikipedia's table.

Looking at Schneider's predictions, the high end 18.7GW consumption rate would mean 163,812 GWh / 163.8 TWh consumed in 2028. Putting it in the company of Poland and Egypt, and making it #25 in the rankings.

Yeah not the first typo/mistake I've seen Tom's make, but thanks for going through all that and getting the more accurate figures! I pointed out another mistake of theirs, from Anton no less, where he was marveling over the A18 CPU scores and wondering how much better that A18 Pro might do ... with its 2 extra CPU cores.

Someone else complained as well, a lot more vitriol than myself and well as I said to him, before Anandtech's passing (common owner with Tom's) Ryan admitted that copy editors had long been fired from most tech news sites - and were increasingly uncommon everywhere as an "unnecessary" expense. And proofreading before publication in general was discouraged as time wasting. So mistakes just don't get caught. BTW the article in question was never edited after the mistake was pointed out, nor on an another article where I pointed out an error. So I probably won't be wasting my time doing that anymore.

Yoused · Sep 20, 2024

MS is dedicated to developing elaborate large model bases, but also, to do it in, uh, environmetally responsible ways. To that end, they are pursuing an energy sourcing agreement that could result in the reopening of a nuclear power plant on the Susquehanna River near Harrisburg PA. You may have heard of that particular plant, but, eh, it has been nearly half a century since it had a catastrophic systems failure, so it is probably all better now.

Eric · Oct 3, 2024

Meta hasn't been this groundbreaking since they announced the ability to search Facebook in 2013.

Herdfan · Oct 4, 2024

Eric said:
Meta hasn't been this groundbreaking

My question came about by Meta (FB), but could apply to all AI.

Due to drought conditions, WV has been very dry this summer and as a result, the yellow jacket population has increased quite a bit. The old trail system I rode for a decade or more has been full of complaints about people being stung.

The problem is all the FB pages dedicated to these trails have been inundated with complaints about "bees". They aren't bees, but yellow jackets.

At the top of many of these posts is the Meta AI summary of what is being discussed, and as a result, it thinks bees are all over the trails. Because that is what these idiots are posting.

Will the AI use the info in these posts as part of its data set going forward, or will it disregard/forget what it previously summarized?

How does it determine if the data is valid or not. In this case, it is not valid because they aren't bees.

dada_dave · Oct 7, 2024

dada_dave said:
Ex-Google CEO says successful AI startups can steal IP and hire lawyers to ‘clean up the mess’

“But if nobody uses your product, it doesn’t matter that you stole all the content.”

www.theverge.com

And this is why government regulations (with teeth) are needed - and not just in this industry. The consequences for committing business crimes should not be so easily factored in as a mere cost of doing business whether they be huge companies or startups backed by them (effectively).

Edit: Oh and there was these gems too just in case he didn’t make it clear what incredible asshole he is:

Edit: original link disappeared

Former Google CEO says climate goals are not meetable, so we might as well drop climate conservation — unshackle AI companies so AI can solve global warming

“We’re not going to hit the climate goals anyway…” says former Google chief Eric Schmidt.

www.tomshardware.com

This guy really is an asshole:

The host then followed up with, “Do you think we can meet AI’s energy without total blowing out climate goals?” and Schmidt answered with, “We’re not going to hit the climate goals anyway because we’re not organized to do it — and the way to do it is with the ways that we’re talking about now — and yes, the needs in this area will be a problem. But I’d rather bet on AI solving the problem than constraining it and having the problem if you see my plan.”

dada_dave · Oct 14, 2024

LLMs don’t do formal reasoning - and that is a HUGE problem

Important new study from Apple

garymarcus.substack.com

A good summary of an Apple study on the inability of AI to do formal reasoning. Now this can be accused of being a study on “is water wet?” But the consequences of it do run deeper as the fragility of the current models and methods means they are incapable of being reliable agents. Also, it is important that it is Apple-backed engineers, academics and smaller companies might be more easily ignored. Apple however is not so easily ignored. It may be coincidence, but maybe not, that Apple also recently pulled out of talks to invest in OpenAI.

Gary Marcus does suggest that neurosymbolic learning (similar to what Google did for an AI to learn geometric proofs) as a way forwards.

This follows on the heels of several studies showing that even for programming AI models have far more limited use than advertised - not none by any stretch of the imagination, as a tool to aid in writing code and documentation it can still be useful, but there are downsides especially to relying on it just to output correct code from a prompt. And doing the latter does not necessarily lead to increased productivity, especially compared to experienced programmers and especially when bugs are taken into account.

Nycturne · Oct 14, 2024

dada_dave said:
the fragility of the current models and methods means they are incapable of being reliable agents

And I will say that the engineers on the ground aren't unaware of this, and have been butting up against this longer than you might think. Doesn't stop the tech bros from pushing the tech though.

dada_dave said:
This follows on the heels of several studies showing that even for programming AI models have far more limited use than advertised - not none by any stretch of the imagination, as a tool to aid in writing code and documentation it can still be useful, but there are downsides especially to relying on it just to output correct code from a prompt. And doing the latter does not necessarily lead to increased productivity, especially compared to experienced programmers and especially when bugs are taken into account.

Nor does it help with learning. Much like how writing things out by hand for notes seems to help retention better than typing the notes. It engages more/different parts of the brain that helps with learning. Having an AI do it for you has much the same problem, but arguably worse.

I am working in a space I'm unfamiliar with at the moment (watched my team get shuffled from owning mobile apps to owning web code), and I actually had jokes made: why wasn't I using AI to get a task done faster?

Because I'd like to actually retain the knowledge of how stuff works so I can be faster, and remain experienced. I didn't get to where I was by not learning on the job.

The Ai thread

Elite Member

Site Master

up

Site Master

Elite Member

Site Champ

Elite Member

Site Champ

Elite Member

Elite Member

up

Elite Member

Elite Member

Elite Member

up

Mama's lil stinker

Resident Redneck

Elite Member

Elite Member

Elite Member

Similar threads