Merge branch 'main' into content-style-guide-update

github · Sep 8, 2024 · 793932e · 793932e
2 parents ed94cc5 + a7a7e1e
commit 793932e
Show file tree

Hide file tree

Showing 81 changed files with 1,336 additions and 2,508 deletions.
diff --git a/assets/images/enterprise/management-console/monitor-dash-link.png b/assets/images/enterprise/management-console/monitor-dash-link.png
diff --git a/assets/images/help/desktop/configure-custom-editor.png b/assets/images/help/desktop/configure-custom-editor.png
diff --git a/content/actions/about-github-actions/understanding-github-actions.md b/content/actions/about-github-actions/understanding-github-actions.md
@@ -9,14 +9,14 @@ redirect_from:
  - /actions/learn-github-actions/introduction-to-github-actions
  - /actions/learn-github-actions/understanding-github-actions
  - /actions/learn-github-actions/essential-features-of-github-actions
+ - /articles/getting-started-with-github-actions
 versions:
  fpt: '*'
  ghes: '*'
  ghec: '*'
 type: overview
 topics:
  - Fundamentals
-layout: inline
 ---
 
 {% data reusables.actions.enterprise-github-hosted-runners %}

diff --git a/...what-your-workflow-does/accessing-contextual-information-about-workflow-runs.md b/...what-your-workflow-does/accessing-contextual-information-about-workflow-runs.md
@@ -365,6 +365,12 @@ The contents of the `vars` context is a mapping of configuration variable names
 
 This example workflow shows how configuration variables set at the repository, environment, or organization levels are automatically available using the `vars` context.
 
+{% note %}
+
+Note: Configuration variables at the environment level are automatically available after their environment is declared by the runner.
+
+{% endnote %}
+
 {% data reusables.actions.actions-vars-context-example-usage %}
 
 ## `job` context

diff --git a/...aging-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard.md b/...aging-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard.md
@@ -27,20 +27,75 @@ shortTitle: Access the monitor dashboard
 
  ![Screenshot of the header of the {% data variables.enterprise.management_console %}. A tab, labeled "Monitor", is highlighted with an orange outline.](/assets/images/enterprise/management-console/monitor-dash-link.png)
 
-## Troubleshooting common resource allocation problems on your appliance
+1. In HA and cluster environments you can switch between nodes using the dropdown and clicking on a different hostname.
 
-{% note %}
+## Using the monitor dashboard
 
-**Note**: Because regularly polling {% data variables.location.product_location %} with continuous integration (CI) or build servers can effectively cause a denial of service attack that results in problems, we recommend using webhooks to push updates. For more information, see "[AUTOTITLE](/get-started/exploring-integrations/about-webhooks)".
+The page visualizes metrics which can be useful for troubleshooting performance issues and better understanding how your {% data variables.product.prodname_ghe_server %} appliance is being used. The data behind the graphs is gathered by the `collectd` service and sampled every 10 seconds.
 
-{% endnote %}
+Within the pre-built dashboard you can find various sections grouping graphs of different types of system resources.
 
-Use the monitor dashboard to stay informed on your appliance's resource health and make decisions on how to fix high usage issues.
+Building your own dashboard and alerts requires the data to be forwarded to an external instance, by enabling `collectd` forwarding. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/configuring-collectd-for-your-instance)."
 
-| Problem | Possible cause(s) | Recommendations |
-| -------- | ----------------- | --------------- |
-| High CPU usage | VM contention from other services or programs running on the same host | If possible, reconfigure other services or programs to use fewer CPU resources. To increase total CPU resources for the VM, see "[AUTOTITLE](/admin/enterprise-management/updating-the-virtual-machine-and-physical-resources/increasing-cpu-or-memory-resources)." |
-| High memory usage | VM contention from other services or programs running on the same host | If possible, reconfigure other services or programs to use less memory. To increase the total memory available on the VM, see "[AUTOTITLE](/admin/enterprise-management/updating-the-virtual-machine-and-physical-resources/increasing-cpu-or-memory-resources)." |
-| Low disk space availability | Large binaries or log files consuming disk space | If possible, host large binaries on a separate server, and compress or archive log files. If necessary, increase disk space on the VM by following the steps for your platform in "[AUTOTITLE](/admin/enterprise-management/updating-the-virtual-machine-and-physical-resources/increasing-storage-capacity)." |
-| Higher than usual response times | Often caused by one of the above issues | Identify and fix the underlying issues. If response times remain high, contact us by visiting {% data variables.contact.contact_ent_support %}. |
-| Elevated error rates | Software issues | Contact us by visiting {% data variables.contact.contact_ent_support %} and include your support bundle. For more information, see "[Providing data to {% data variables.product.prodname_enterprise %} Support](/enterprise/{{ currentVersion}}/admin/guides/enterprise-support/providing-data-to-github-support#creating-and-sharing-support-bundles)." |
+## About the metrics on the monitor dashboard
+
+### System health
+
+The system health graphs provide a general overview of services and system resource utilization. The CPU, memory, and load average graphs are useful for identifying trends or times where provisioned resource saturation has occurred. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/recommended-alert-thresholds)."
+
+### Processes
+
+The processes graph section looks deeper into the major individual services which make up the {% data variables.product.prodname_ghe_server %} appliance. Looking at these services individually can show how usage trends impact system resources over time.
+
+### Authentication
+
+The authentication graphs break down the rates at which users and applications are authenticating to the {% data variables.product.prodname_ghe_server %} appliance. We also track the protocol or service type such as Git or API for the authentications, which is useful in identifying broad user activity trends. The authentication graphs can help you find interesting trends or timeframes to look at when diving deeper into authentication and API request logs.
+
+### LDAP
+
+LDAP graphs will only display data if LDAP authentication is enabled on the {% data variables.product.prodname_ghe_server %} appliance. For more information, see "[AUTOTITLE](/admin/managing-iam/using-ldap-for-enterprise-iam/using-ldap)." These graphs can help you to identify slow responses from your LDAP server, as well as the overall volume of LDAP password based authentications.
+
+### App servers
+
+The application servers section provides insight into the activity of {% data variables.product.prodname_ghe_server %} services which provide data to users and integrations.
+
+### App request/response
+
+The **App request/response** section looks at the rate of requests, how quickly those requests are responded to, and with what status they returned.
+
+### Actions
+
+The graphs break down different metrics about {% data variables.product.prodname_actions %} on {% data variables.location.product_location %} including an overview of {% data variables.product.prodname_actions %} services web requests.
+
+### Background jobs
+
+Number of tasks queued for background processing on the {% data variables.product.prodname_ghe_server %} appliance.
+
+### Network
+
+The network interface graphs can be useful in profiling user activity, and throughput of traffic in and out of the {% data variables.product.prodname_ghe_server %} appliance.
+
+### Storage
+
+{% data variables.product.prodname_ghe_server %} repository performance is very dependent on the underlying storage system. Low latency, local SSD disks provide the highest performance. For more information on the {% data variables.product.prodname_enterprise %} storage architecture, see "[AUTOTITLE](/[email protected]/admin/overview/system-overview)."
+
+### Appliance-specific system services
+
+System services graphs contain data related to the major databases on {% data variables.product.prodname_ghe_server %}. These are MySQL, and Elasticseach persistent databases, as well as Redis and Memcached which contain ephemeral data.
+
+* Memcached: Provides a layer of in-memory caching for web and API operations. Memcached helps to provide quicker response times for users and integrations interacting with the system.
+* MySQL: The primary database in {% data variables.product.prodname_ghe_server %}. User, issue, and other non-git or search related metadata is stored within MySQL.
+* Nomad Jobs: {% data variables.product.prodname_ghe_server %} utilizes Nomad internally as the workload orchestrator, where the CPU and memory usage of individual services can be seen.
+* Redis: The database mainly contains background job queue, as well as session state information.
+* Kafka-Lite: Kafka broker service for job processing.
+* Elasticsearch: Powers the built-in search features in {% data variables.product.prodname_ghe_server %}.
+* Custom hooks: Graphs related to pre-receive hook execution.
+* Git fetch caching: {% data variables.product.prodname_ghe_server %} will attempt to cache intensive operations, such as Git pack-objects, when multiple identical requests arrive in quick succession.
+* MinIO: Storage used by some {% data variables.product.prodname_ghe_server %} services.
+* Packages: Requests powering {% data variables.product.prodname_registry %}.
+* SecretScanning: Services powering {% data variables.product.prodname_secret_scanning_caps %} features.
+* CodeScanning: Services powering {% data variables.product.prodname_code_scanning_caps %} features.
+* Cluster: Graphs related to {% data variables.product.prodname_ghe_server %} high availability or clustering.
+* Babeld: Git proxy.
+* Alive: Service powering live updates.
+* ghes-manage: Service powering GHES Manage API.
diff --git a/...t/admin/monitoring-and-managing-your-instance/monitoring-your-instance/index.md b/...t/admin/monitoring-and-managing-your-instance/monitoring-your-instance/index.md
@@ -21,6 +21,7 @@ children:
  - /collectd-metrics-for-github-enterprise-server
  - /monitoring-using-snmp
  - /about-system-logs
+ - /troubleshooting-resource-allocation-problems
  - /generating-a-health-check-for-your-enterprise
 shortTitle: Monitor your instance
 ---

diff --git a/...managing-your-instance/monitoring-your-instance/recommended-alert-thresholds.md b/...managing-your-instance/monitoring-your-instance/recommended-alert-thresholds.md
@@ -24,37 +24,39 @@ shortTitle: Recommended alert thresholds
 
 ## About recommended alert thresholds
 
-You can configure external monitoring systems to alert you to storage, CPU, and memory usage that may cause problems with {% data variables.location.product_location %}. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/setting-up-external-monitoring)."
+You can configure external monitoring systems to alert you to storage, CPU, and memory usage that may cause problems with {% data variables.location.product_location %}. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/setting-up-external-monitoring)" and "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard)."
 
 ## Monitoring storage
 
 We recommend that you monitor both the root and user storage devices and configure an alert with values that allow for ample response time when available disk space is low.
 
 | Severity | Threshold |
 | -------- | --------- |
-| **Warning** | Disk use exceeds 70% of total available |
-| **Critical** | Disk use exceeds 85% of total available |
+| **Warning** | Disk use exceeds 60% of total available |
+| **Critical** | Disk use exceeds 75% of total available |
 
 You can adjust these values based on the total amount of storage allocated, historical growth patterns, and expected time to respond. We recommend over-allocating storage resources to allow for growth and prevent the downtime required to allocate additional storage.
 
 ## Monitoring CPU and load average usage
 
-Although it is normal for CPU usage to fluctuate based on resource-intense Git operations, we recommend configuring an alert for abnormally high CPU utilization, as prolonged spikes can mean your instance is under-provisioned. We recommend monitoring the fifteen-minute system load average for values nearing or exceeding the number of CPU cores allocated to the virtual machine.
+Although it is normal for CPU usage to fluctuate based on resource-intense Git operations, we recommend configuring an alert for abnormally high CPU utilization, as prolonged spikes can mean your instance is under-provisioned. Additionally, we recommend monitoring CPU utilization during a regular work week when the instance is in a healthy state to establish a baseline that can be used as a reference.
 
 | Severity | Threshold |
 | -------- | --------- |
-| **Warning** | Fifteen minute load average exceeds 1x CPU cores |
-| **Critical** | Fifteen minute load average exceeds 2x CPU cores |
+| **Warning** | 20% above the baseline |
+| **Critical** | 40% above the baseline |
 
 We also recommend that you monitor virtualization "steal" time to ensure that other virtual machines running on the same host system are not using all of the instance's resources.
 
 ## Monitoring memory usage
 
-The amount of physical memory allocated to {% data variables.location.product_location %} can have a large impact on overall performance and application responsiveness. The system is designed to make heavy use of the kernel disk cache to speed up Git operations. We recommend that the normal RSS working set fit within 50% of total available RAM at peak usage.
+The amount of physical memory allocated to {% data variables.location.product_location %} can have a large impact on overall performance and application responsiveness. The system is designed to make heavy use of the kernel disk cache to speed up Git operations. We recommend that the amount of physical memory assigned to the processes fit within 50% of total available RAM at peak usage.
 
 | Severity | Threshold |
 | -------- | --------- |
-| **Warning** | Sustained RSS usage exceeds 50% of total available memory |
-| **Critical** | Sustained RSS usage exceeds 70% of total available memory |
+| **Warning** | Sustained memory usage exceeds 50% of total available memory |
+| **Critical** | Sustained memory usage exceeds 70% of total available memory |
+
+Nevertheless, for cluster installations, we recommend following a similar approach to CPU monitoring: establish a baseline that defines what is considered normal usage, and set the threshold accordingly. This threshold may also vary between roles.
 
 If memory is exhausted, the kernel OOM killer will attempt to free memory resources by forcibly killing RAM heavy application processes, which could result in a disruption of service. We recommend allocating more memory to the virtual machine than is required in the normal course of operations.