
What logs need to be truncated
Hi All,
I seem to keep finding new logs/working files that I need to clean. This is mostly an issue on my dev system since it does not have as much disk allocated as the others, but once I find it on dev, I see that it is also large on my other environments.
So far I have the following:
Analytics server:
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_ANA_SID/logs/kafka
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_ANA_SID/logs/incorta
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_ANA_SID/logs/
Loader Server (also my Spark server)
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_LOAD_SID/logs/kafka
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_LOAD_SID/logs/incorta
/opt/incorta/IncortaAnalytics/IncortaNode/services/INC_LOAD_SID/logs/
/opt/incorta/IncortaAnalytics/IncortaNode/spark/eventlogs
/opt/incorta/IncortaAnalytics/IncortaNode/spark/work
I did post my script that I had previously on https://community.incorta.com/t/35hhpn5/log-retention, but I seem to find new places to clean. The other issue is trying to determine an appropriate amount of time to keep the logs. The "spark/work" folder had just over 5GB of files and some of the other folders had even more.
Does anyone have other folders or files they are cleaning?
Thanks
-
Cleaning up Spark Log Files
If you have MVs, Spark log files can occupy huge amount of disk space over time, By default, Spark does not regularly clean up those files. Change the following Spark properties in spark-defaults.conf to values that support your planned activity, and monitor these settings over time:
spark.worker.cleanup.enabled
Enables periodic cleanup of worker and application directories. This is disabled by default. Set to true to enable it.
spark.worker.cleanup.interval
The frequency, in seconds, that the worker cleans up old application work directories. The default is 30 minutes. Modify the value as you deem appropriate.
spark.worker.cleanup.appDataTtl
Controls how long, in seconds, to retain application work directories. The default is 7 days, which is generally inadequate if Spark jobs are run frequently. Modify the value as you deem appropriate.