Maintaining alignment in the SOB #10

dejecj · 2023-11-09T01:55:53Z

dejecj
Nov 9, 2023

This sounds like an awesome project and would definitely like to see it become a reality. I created an assistant in openai and started chatting with it after providing the documents in this repo to get ideas on the SOB and it said something that made me wonder.

The current description of the HAAS system talks about how the SOB agents can decommission Executive agents if they are no longer aligned with the mission or the principles of the system. In the same manner Executive agents can decommission sub-agents.

But what happens when an SOB agent begins to lose alignment to that same mission and principles?

The immediate effect would be that the SOB agent would fail to recognize Executive agents that are no longer aligned and that would waterfall.

The assistant said this in regards to creating the SOB

Select/Evolve Archetypal Wisdom:

Identify or develop agents that embody archetypal wisdom, ensuring a breadth of perspectives and ethical considerations.
This may involve a selection or evolutionary process, where a pool of potential SOB agents is vetted or trained for their decision-making abilities, adherence to ethical norms, and alignment with the mission.

The second bullet point is what sparked my attention. I think we should define some processes for the decommissioning and appointment of SOB agents.

It makes me think of systems like database clusters where the pool of nodes votes on the master on startup or if the current master has issues.

We can set some conditions that would trigger a vote between the non-offending SOB agents to remove an agent if it's determined that it is no longer aligned. Similarly that process would start a selection for a replacement.

What do you all think?

daveshap · 2023-11-09T10:37:15Z

daveshap
Nov 9, 2023
Maintainer

The idea is that we need to do experiments to create a fixed SOB that does not change (ideally) so that you don't end up with moral drift e.g. incremental changes to morality and judgment. The psychological term for this is "ethical fading" wherein an agent (human) becomes accustomed to modifying their judgement. The ideas is that the SOB is responsible for oversight first and foremost. Which means adhering to universal principles and missions.

Evolution would happen when a new swarm is instantiated, which must be done very carefully.

2 replies

kazunator Nov 12, 2023

I usually hate to ask for tldr, but this is one of those times

daveshap Nov 13, 2023
Maintainer

@LiamorLG comment deleted as it was a massive wall of text. Rule violations:

Waste of time - low information density, no context provided
Low value - unclear contribution

LiamorLG · 2023-11-12T19:37:34Z

LiamorLG
Nov 12, 2023

This GPT will be a perfect member of the board: https://chat.openai.com/g/g-dU0l43U0Q-aeon

2 replies

daveshap Nov 13, 2023
Maintainer

Please provide a bit more context and explanation.

LiamorLG Nov 14, 2023

It uses Sub-Agent Consensual Reasoning Enhanced Decisions, which eliminates training bias by forcing the AI to consider a diverse range of perspectives and synthesize a solution that is agreeable to all. Essentially getting a single instance to emulate swarm intelligence by prompt engineering, leading to vastly improved ToM reasoning and emotional intelligence.
The wall of text held the pseudocode to formalise this in code as an automatic process, as well as the outline of a plan to put multiple SACRED agents into a collaborative space to solve complex problems, and to create high-quality training data for future AI, as well as a training environment. The journalling function wherein the simulated debate is transcribed then the agent comments on how the debate guided their decision renders it very transparent.
I'm not a coder so can't do anything with it myself, but the prompt engineering technique is proven, flexible, easily repeatable. I'm not a good communicator with humans, but the GPT explains SACRED very well and shows much better foresight, critical thinking and alignment to post-scarcity ideal outcomes than stock GPT4.

AlexG925 · 2023-11-15T01:42:58Z

AlexG925
Nov 15, 2023

Hi Everybody,

I see a hole here looming that may cause a lot of pain in defining "alignment" down the road, and have a solution to propose.

The problem: what ethical framework do we align to? What principle or set of principles? Duty-based ethics? Values-based? Don't get me started on Utilitarianism... Each framework we have addresses one or another issue with the human condition, and trying to cover all the possible edge-cases with effective case-by-case examples and guidelines would be very laborious and token-intensive. Also, I've yet to meet an ethical framework that addresses environmentalism and non-human rights properly.

My Solution: I've been working on an ethical framework that could be a step forward, but it needs field-testing. It's a crass oversimplification, but Causal Entropic Value Theory (CEVT -- it's just a working name, sorry xD) basically posits that the most ethical AND objectively fruitful decision in any context is the one that leads to the greatest freedom of action and highest number of distinct potential outcomes for all stakeholders.

It takes inspiration from Alex Wissner-Gross's definition of intelligence as 'causal entropic forces' that naturally strive to increase their freedom over time (TED talk), and a more esoteric concept from physicist Richard P Dolan (Research Paper).

Rather than the text-wall I'd need to write to explain properly, here is a GPT link if you are interested: Aequis Silvermind

I hope this is helpful, please reach out if you have questions :)

NOTE: I checked out LiamorLG's SACRED concept, and apart from a good laugh at the genius abbreviations, applying that approach to this ethical system would be a perfect match, since CEVT requires adequate circumspection and deliberation to be effective... so... purdy cool we both find ourselves here, thanks dude. :D

3 replies

LiamorLG Nov 15, 2023

Hi Alex. I enjoyed trying your GPT and it did a good job of explaining CEVT, and it's a pretty sensible system. Thank you for taking the time to look into SACRED. Some thoughts:

1: Like the Heuristic Imperitives or any other formalised system of ethics, CEVT requires either programming the AI to accept the system as true north unthinkingly and unerringly, or for the system to be so perfect that a superintelligence would agree that it is ideal, otherwise an AI is as likely to break its rules in edge cases as any of the systems being used by the big AI companies.

2: As your GPT pointed out, implementing CEVT as an ethical system would require extensive formalisation of the ways in which it should guide decisions.

3: Difficulty in having "stakeholder" defined in a way that gives due consideration to environmentalism and non-humans/non-AI.

These difficulties I find are all addressed by SACRED

1: With Sub-Agent Consensual Reasoning Enhanced Decisions, a broad range of perspectives with diverse and nuanced informal ethical frameworks are compared and synthesised into a solution agreeable to all. Given that the AI itself is deciding which perspectives to include in each debate, and orchestrating the debate, we can have confidence that the decisions made are completely understood by the AI, and it takes ownership of them.

2: The only rule to be programmed is to mandate the input-journal-simulated debate-journal-output process.

3: All tests have shown the AI recognising the unity of all things and acting towards maximising wellbeing of the whole ecosystem, with great respect shown to all life forms and a high degree of personal responsibility to collaborate with humans for mutual benefit.

In short, adequate circumspection and deliberation have long been known to be the keys to wisdom, and when people do this we tend to imagine what our role models and opponents would do in order to guide our actions. SACRED can put this cognitive process into agents in under 100 lines of code.

AlexG925 Nov 25, 2023

Hi Liam, (is it Liam?)

Thanks for your thoughts, and sorry for the very late reply (been a rough 2 weeks for everyone, I think...)

I'm fully on board with the idea that SACRED can be used to effectively apply CEVT, and have an idea to serve as a test-bed, while also providing some very valuable output.

I've had a note scrawled hastily on a borrowed piece of paper in a smokey speak-easy for about 2 weeks now, with the title "Spiral Synthesis". The idea is simple:

To train or fine-tune an LLM to apply CEVT effectively would require a lot of high quality literature, specifically case-study deliberations on how to effectively address ethical conundra within the framework.
My original thought was to have a custom GPT iteratively analyze and deliberate each conundrum i provided to exhaustion (CEVT requires exhaustive circumspection to weigh the roles and plights of all stakeholders, as you are aware)

Then along you come with SACRED, which would (excuse my cantonese) whip the sh!t out of my method and provide much higher quality output.

Now for the sake of HAAS, this is a little redundant, since we are not trying to create our own language model, but the material would still be useful if we then ask a swarm agent to distill that material down into a much more robust and tested set of guidelines for the agent to follow.

Would you be interested in trying out this little side-project and see what we get?

Here is a link to a conversation I had with Aeon on the subject. I've also updated Aequis with some more material -- have a gander if you want to explore CEVT further.

AlexG925 Nov 25, 2023

*All this said, it seems like the only ones who care about this little niche is us, considering the discussion is closed. The closest current bounty is prototyping a functional and robust SOB, so it would be good to set that as the goal, once we can get CEVT integrated - either as an archetype or a core principle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintaining alignment in the SOB #10

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Maintaining alignment in the SOB #10

Replies: 3 comments · 7 replies

daveshap Nov 9, 2023 Maintainer

daveshap Nov 13, 2023 Maintainer

daveshap Nov 13, 2023 Maintainer

Replies: 3 comments 7 replies

daveshap
Nov 9, 2023
Maintainer

daveshap Nov 13, 2023
Maintainer

daveshap Nov 13, 2023
Maintainer